Author: Jason Tompkins
Course: IST 718 Big Data Analysis
Date: August 19, 2021
The Syracuse Real Estate Investment Trust (SREIT) would like to select a geographic area to focus their investment activities. The primary goal is to find an area with higher-than-average return on investment (ROI). The secondary goal will be to select an area where the growth is relatively stable.
The Zillow Home Value Index contains that monthly median single family residence price for 30,464 zip codes. A significant amount of the historical data goes back as far as January 1996. Using 24 years of SFR price data will allow SREIT to analyze and forecast the viability of real estate investments across all 30,464 zip codes.
Data Sources:
In addition to analyzing all the zip code data, SREIT committee would like an area analysis of four Arkansas Metro areas; Hot Springs, Little Rock, Fayetteville, and Searcy. It has been noted that many of the Metro areas have seen strong growth. Not only is there interest in potentially investing in that region, but also using the Arkansas Metro areas as a basis for comparison to other metro areas that have experienced a high rate of growth.
Forecasting tools will be used to project the probable investment growth for high potential metro areas. The result of this analysis will be a recommendation for three specific zip codes in the same metro area for the committee to focus on.
# Load Packages
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from fbprophet import Prophet
import plotly.express as px
import plotly.graph_objects as go
# Read in and inspect the Zip Code Data file
data = pd.read_csv('Zip_Zhvi_SingleFamilyResidence.csv')
print(data.head(10))
print('Data Shape:', data.shape)
RegionID SizeRank RegionName RegionType StateName State City \
0 61639 0 10025 Zip NY NY New York
1 84654 1 60657 Zip IL IL Chicago
2 61637 2 10023 Zip NY NY New York
3 91982 3 77494 Zip TX TX Katy
4 84616 4 60614 Zip IL IL Chicago
5 91940 5 77449 Zip TX TX Katy
6 61616 6 10002 Zip NY NY New York
7 91733 7 77084 Zip TX TX Houston
8 93144 8 79936 Zip TX TX El Paso
9 84640 9 60640 Zip IL IL Chicago
Metro CountyName 1/31/1996 ... \
0 New York-Newark-Jersey City New York County NaN ...
1 Chicago-Naperville-Elgin Cook County 364892.0 ...
2 New York-Newark-Jersey City New York County NaN ...
3 Houston-The Woodlands-Sugar Land Harris County 200475.0 ...
4 Chicago-Naperville-Elgin Cook County 546663.0 ...
5 Houston-The Woodlands-Sugar Land Harris County 97521.0 ...
6 New York-Newark-Jersey City New York County NaN ...
7 Houston-The Woodlands-Sugar Land Harris County 97381.0 ...
8 El Paso El Paso County 82374.0 ...
9 Chicago-Naperville-Elgin Cook County 254388.0 ...
6/30/2019 7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 \
0 1413747 1405862 1402547 1390420 1381621 1375725
1 974693 975616 975734 975251 974238 973104
2 1528603 1514894 1502233 1492429 1486122 1480426
3 335536 335878 335940 336092 336119 336083
4 1207765 1208853 1208481 1206304 1204013 1201182
5 184389 185268 185951 186457 187057 187978
6 1300168 1292031 1285381 1278986 1269337 1260484
7 182153 182842 183402 183817 184460 185299
8 132739 133136 133509 133537 133581 133748
9 722778 723134 724125 725136 727643 728915
12/31/2019 1/31/2020 2/29/2020 3/31/2020
0 1374714 1381453 1385737 1389268
1 971908 972038 973671 975642
2 1476509 1478980 1479301 1474994
3 336154 335860 336037 336483
4 1198879 1198277 1199900 1200980
5 188925 189522 190210 191469
6 1249923 1245880 1236317 1228566
7 186196 186740 187491 188738
8 134305 134950 135433 135692
9 729467 730647 730647 729824
[10 rows x 300 columns]
Data Shape: (30464, 300)
One of the challenges with the ZHVI data is that not all the zip codes have complete data for the 24-year period between 1996 and 2020. In an effort to preserve all of the observations, a subset was created to compare the growth rate of property values over a five-year period, 2015-2020.
The mean five-year ROI for single family residences was 26.2% with a 16.6% standard deviation. The median ROI was 24.5%. The upper quartile for ROI is between 33.9% and a dizzying 248%. The upper quartile was sorted to rank the top 30 zip codes in terms of ROI.
There are several geographic areas of note including Columbus-Ohio, Tampa-Florida, Beverly Hills-California, and Tacoma-Washington. The most significant metro areas are ones with several high growth zip codes in the top 30. Atlanta-Sandy Springs-Roswell in Georgia has six zip codes in the top 30. Dallas-Fort Worth-Arlington in Texas has ten.
# Select relavant attributes
dataID = data[['RegionName', 'City', 'Metro']]
dataX = data.iloc[:, 240:300]
# Concatonate the columns to one data frame
dataSubset = pd.concat([dataID, dataX.reindex(dataID.index)], axis=1)
print(dataSubset.head())
print('Data Shape:', dataSubset.shape)
RegionName City Metro 4/30/2015 \ 0 10025 New York New York-Newark-Jersey City 1356525.0 1 60657 Chicago Chicago-Naperville-Elgin 903862.0 2 10023 New York New York-Newark-Jersey City 1495385.0 3 77494 Katy Houston-The Woodlands-Sugar Land 334607.0 4 60614 Chicago Chicago-Naperville-Elgin 1098788.0 5/31/2015 6/30/2015 7/31/2015 8/31/2015 9/30/2015 10/31/2015 ... \ 0 1350097.0 1355930.0 1362446.0 1375476.0 1380169.0 1388270.0 ... 1 912966.0 915000.0 916602.0 920152.0 923585.0 923550.0 ... 2 1494490.0 1506862.0 1513489.0 1515314.0 1517631.0 1517245.0 ... 3 336254.0 337852.0 338657.0 339058.0 339256.0 339524.0 ... 4 1110803.0 1114983.0 1119110.0 1123493.0 1126324.0 1126873.0 ... 6/30/2019 7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 \ 0 1413747 1405862 1402547 1390420 1381621 1375725 1 974693 975616 975734 975251 974238 973104 2 1528603 1514894 1502233 1492429 1486122 1480426 3 335536 335878 335940 336092 336119 336083 4 1207765 1208853 1208481 1206304 1204013 1201182 12/31/2019 1/31/2020 2/29/2020 3/31/2020 0 1374714 1381453 1385737 1389268 1 971908 972038 973671 975642 2 1476509 1478980 1479301 1474994 3 336154 335860 336037 336483 4 1198879 1198277 1199900 1200980 [5 rows x 63 columns] Data Shape: (30464, 63)
#dataSubset.isnull().sum()
dataSubset.City.fillna(value = "Unidentified")
0 New York
1 Chicago
2 New York
3 Katy
4 Chicago
...
30459 Charlotte Amalie
30460 Choudrant
30461 Choudrant
30462 Logan
30463 Granby
Name: City, Length: 30464, dtype: object
dataSubset.dropna(inplace=True)
#dataSubset.info
dataSubset.dtypes
RegionName int64
City object
Metro object
4/30/2015 float64
5/31/2015 float64
...
11/30/2019 int64
12/31/2019 int64
1/31/2020 int64
2/29/2020 int64
3/31/2020 int64
Length: 63, dtype: object
dataSubset['growth5yr'] = ((dataSubset['3/31/2020'] - dataSubset['4/30/2015']) / dataSubset['4/30/2015'])
#dataSubset.dtypes
dataSubset.head()
| RegionName | City | Metro | 4/30/2015 | 5/31/2015 | 6/30/2015 | 7/31/2015 | 8/31/2015 | 9/30/2015 | 10/31/2015 | ... | 7/31/2019 | 8/31/2019 | 9/30/2019 | 10/31/2019 | 11/30/2019 | 12/31/2019 | 1/31/2020 | 2/29/2020 | 3/31/2020 | growth5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10025 | New York | New York-Newark-Jersey City | 1356525.0 | 1350097.0 | 1355930.0 | 1362446.0 | 1375476.0 | 1380169.0 | 1388270.0 | ... | 1405862 | 1402547 | 1390420 | 1381621 | 1375725 | 1374714 | 1381453 | 1385737 | 1389268 | 0.024137 |
| 1 | 60657 | Chicago | Chicago-Naperville-Elgin | 903862.0 | 912966.0 | 915000.0 | 916602.0 | 920152.0 | 923585.0 | 923550.0 | ... | 975616 | 975734 | 975251 | 974238 | 973104 | 971908 | 972038 | 973671 | 975642 | 0.079415 |
| 2 | 10023 | New York | New York-Newark-Jersey City | 1495385.0 | 1494490.0 | 1506862.0 | 1513489.0 | 1515314.0 | 1517631.0 | 1517245.0 | ... | 1514894 | 1502233 | 1492429 | 1486122 | 1480426 | 1476509 | 1478980 | 1479301 | 1474994 | -0.013636 |
| 3 | 77494 | Katy | Houston-The Woodlands-Sugar Land | 334607.0 | 336254.0 | 337852.0 | 338657.0 | 339058.0 | 339256.0 | 339524.0 | ... | 335878 | 335940 | 336092 | 336119 | 336083 | 336154 | 335860 | 336037 | 336483 | 0.005607 |
| 4 | 60614 | Chicago | Chicago-Naperville-Elgin | 1098788.0 | 1110803.0 | 1114983.0 | 1119110.0 | 1123493.0 | 1126324.0 | 1126873.0 | ... | 1208853 | 1208481 | 1206304 | 1204013 | 1201182 | 1198879 | 1198277 | 1199900 | 1200980 | 0.093004 |
5 rows × 64 columns
dataSubset.growth5yr.describe()
count 22799.000000 mean 0.261567 std 0.166180 min -0.387390 25% 0.157901 50% 0.245272 75% 0.338850 max 2.480128 Name: growth5yr, dtype: float64
# Create a subset of the top 25% of zip codes by 5-year ROI.
dataSubset1 = dataSubset[(dataSubset["growth5yr"] > 0.33)]
dataSubset1.head
<bound method NDFrame.head of RegionName City Metro \
10 11226 New York New York-Newark-Jersey City
11 10467 New York New York-Newark-Jersey City
13 94109 San Francisco San Francisco-Oakland-Hayward
16 37013 Nashville Nashville-Davidson--Murfreesboro--Franklin
22 90250 Hawthorne Los Angeles-Long Beach-Anaheim
... ... ... ...
30437 20052 Washington Washington-Arlington-Alexandria
30447 89155 Las Vegas Las Vegas-Henderson-Paradise
30450 73019 Norman Oklahoma City
30455 4109 Portland Portland-South Portland
30461 822 Choudrant Ruston
4/30/2015 5/31/2015 6/30/2015 7/31/2015 8/31/2015 9/30/2015 \
10 754460.0 764790.0 775882.0 784386.0 792193.0 801734.0
11 385649.0 389844.0 396309.0 402295.0 404724.0 405229.0
13 2249311.0 2262070.0 2298533.0 2339045.0 2357945.0 2358890.0
16 156335.0 157701.0 159064.0 160430.0 161780.0 163540.0
22 484598.0 487418.0 490345.0 492849.0 494992.0 497337.0
... ... ... ... ... ... ...
30437 987663.0 989807.0 994089.0 1001726.0 1006871.0 1019119.0
30447 282162.0 284694.0 286128.0 286409.0 287003.0 288713.0
30450 147728.0 148745.0 149309.0 150125.0 151482.0 153053.0
30455 471394.0 473393.0 472507.0 474931.0 478466.0 480854.0
30461 134480.0 135197.0 135976.0 136703.0 136099.0 136454.0
10/31/2015 ... 7/31/2019 8/31/2019 9/30/2019 10/31/2019 \
10 811120.0 ... 1086222 1081349 1077718 1077732
11 403486.0 ... 523086 523928 523450 524723
13 2358210.0 ... 3050229 3057067 3064113 3075367
16 165062.0 ... 241604 242314 243053 244175
22 501123.0 ... 650849 652746 655423 658568
... ... ... ... ... ... ...
30437 1031568.0 ... 1314342 1322163 1332054 1339939
30447 290502.0 ... 376620 375529 375639 376837
30450 154358.0 ... 194014 194549 196004 197268
30455 481833.0 ... 612420 615999 620066 623246
30461 136515.0 ... 172371 172732 173377 174192
11/30/2019 12/31/2019 1/31/2020 2/29/2020 3/31/2020 growth5yr
10 1077017 1077031 1071354 1067056 1063984 0.410259
11 527675 532627 534816 536316 532923 0.381886
13 3091849 3124223 3149517 3177555 3199764 0.422553
16 245482 246768 248066 249378 250997 0.605507
22 661550 666610 672302 680178 687522 0.418747
... ... ... ... ... ... ...
30437 1349835 1357347 1363194 1367376 1367603 0.384686
30447 379157 381329 384148 386081 389510 0.380448
30450 198517 199381 200973 202549 204808 0.386386
30455 625528 627467 632956 639095 645736 0.369843
30461 175176 175615 176689 178685 181195 0.347375
[6117 rows x 64 columns]>
dataSubset1.sort_values(by=['growth5yr'], inplace=True, ascending=False)
dataSubset1.head
c:\users\nosta\.conda\envs\v-env\lib\site-packages\pandas\util\_decorators.py:311: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return func(*args, **kwargs)
<bound method NDFrame.head of RegionName City Metro 4/30/2015 \
7981 43205 Columbus Columbus 59732.0
6444 30314 Atlanta Atlanta-Sandy Springs-Roswell 54298.0
3267 30310 Atlanta Atlanta-Sandy Springs-Roswell 78299.0
2962 32208 Jacksonville Jacksonville 39348.0
2146 30311 Atlanta Atlanta-Sandy Springs-Roswell 81877.0
... ... ... ... ...
16191 45845 Fort Loramie Sidney 155509.0
10092 30755 Tunnel Hill Dalton 101911.0
9931 31064 Monticello Atlanta-Sandy Springs-Roswell 108196.0
11113 29123 Pelion Columbia 103525.0
7470 45432 Beavercreek Dayton 106241.0
5/31/2015 6/30/2015 7/31/2015 8/31/2015 9/30/2015 10/31/2015 ... \
7981 59449.0 59635.0 60328.0 61688.0 62798.0 63811.0 ...
6444 55001.0 55316.0 56172.0 56842.0 57285.0 57394.0 ...
3267 79488.0 80291.0 80787.0 81228.0 82167.0 83847.0 ...
2962 38848.0 38454.0 38537.0 39057.0 39365.0 39521.0 ...
2146 83881.0 84995.0 85852.0 85729.0 86212.0 86945.0 ...
... ... ... ... ... ... ... ...
16191 156637.0 157450.0 158157.0 159064.0 159861.0 160746.0 ...
10092 102381.0 102757.0 103123.0 103404.0 103766.0 104274.0 ...
9931 108616.0 108977.0 109422.0 109726.0 109949.0 110099.0 ...
11113 103477.0 103818.0 104003.0 103604.0 103106.0 101973.0 ...
7470 105757.0 105796.0 106121.0 106494.0 106870.0 107223.0 ...
7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 12/31/2019 \
7981 190884 194402 197836 200347 202565 204400
6444 157172 159175 161203 163443 166123 169741
3267 228698 229073 229733 230116 231321 232872
2962 100501 101266 101774 102537 102687 103488
2146 195887 196764 197597 199317 201088 203973
... ... ... ... ... ... ...
16191 203323 204377 205350 206136 206492 206569
10092 131654 132315 133061 134092 134778 135088
9931 139591 140420 141106 141706 141983 142221
11113 131880 132433 133062 133671 133863 134078
7470 135205 135975 136761 137743 138649 139225
1/31/2020 2/29/2020 3/31/2020 growth5yr
7981 205186 206203 207875 2.480128
6444 174383 180050 184547 2.398781
3267 235007 237397 238612 2.047446
2962 104364 105567 107090 1.721612
2146 206085 208123 208673 1.548616
... ... ... ... ...
16191 206643 206712 206840 0.330084
10092 134960 135085 135550 0.330082
9931 142512 143244 143908 0.330068
11113 134540 136104 137694 0.330056
7470 139787 140331 141303 0.330023
[6117 rows x 64 columns]>
dataTop30 = dataSubset1.iloc[0:30,:]
dataTop30.reset_index(drop=True, inplace=True)
dataTop30
| RegionName | City | Metro | 4/30/2015 | 5/31/2015 | 6/30/2015 | 7/31/2015 | 8/31/2015 | 9/30/2015 | 10/31/2015 | ... | 7/31/2019 | 8/31/2019 | 9/30/2019 | 10/31/2019 | 11/30/2019 | 12/31/2019 | 1/31/2020 | 2/29/2020 | 3/31/2020 | growth5yr | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 43205 | Columbus | Columbus | 59732.0 | 59449.0 | 59635.0 | 60328.0 | 61688.0 | 62798.0 | 63811.0 | ... | 190884 | 194402 | 197836 | 200347 | 202565 | 204400 | 205186 | 206203 | 207875 | 2.480128 |
| 1 | 30314 | Atlanta | Atlanta-Sandy Springs-Roswell | 54298.0 | 55001.0 | 55316.0 | 56172.0 | 56842.0 | 57285.0 | 57394.0 | ... | 157172 | 159175 | 161203 | 163443 | 166123 | 169741 | 174383 | 180050 | 184547 | 2.398781 |
| 2 | 30310 | Atlanta | Atlanta-Sandy Springs-Roswell | 78299.0 | 79488.0 | 80291.0 | 80787.0 | 81228.0 | 82167.0 | 83847.0 | ... | 228698 | 229073 | 229733 | 230116 | 231321 | 232872 | 235007 | 237397 | 238612 | 2.047446 |
| 3 | 32208 | Jacksonville | Jacksonville | 39348.0 | 38848.0 | 38454.0 | 38537.0 | 39057.0 | 39365.0 | 39521.0 | ... | 100501 | 101266 | 101774 | 102537 | 102687 | 103488 | 104364 | 105567 | 107090 | 1.721612 |
| 4 | 30311 | Atlanta | Atlanta-Sandy Springs-Roswell | 81877.0 | 83881.0 | 84995.0 | 85852.0 | 85729.0 | 86212.0 | 86945.0 | ... | 195887 | 196764 | 197597 | 199317 | 201088 | 203973 | 206085 | 208123 | 208673 | 1.548616 |
| 5 | 30032 | Candler-Mcafee | Atlanta-Sandy Springs-Roswell | 83564.0 | 83412.0 | 82708.0 | 82401.0 | 82830.0 | 83877.0 | 84646.0 | ... | 204159 | 205479 | 206550 | 207828 | 208568 | 209173 | 209845 | 210278 | 211567 | 1.531796 |
| 6 | 75224 | Dallas | Dallas-Fort Worth-Arlington | 73182.0 | 74731.0 | 76706.0 | 78521.0 | 79636.0 | 80401.0 | 81311.0 | ... | 171017 | 172292 | 174196 | 176359 | 178328 | 179770 | 181452 | 183072 | 184503 | 1.521153 |
| 7 | 90211 | Beverly Hills | Los Angeles-Long Beach-Anaheim | 1010992.0 | 1029976.0 | 1047114.0 | 1065646.0 | 1079371.0 | 1097075.0 | 1118852.0 | ... | 2253956 | 2276149 | 2300187 | 2329400 | 2354435 | 2384830 | 2423355 | 2485106 | 2544209 | 1.516547 |
| 8 | 75141 | Hutchins | Dallas-Fort Worth-Arlington | 65704.0 | 66877.0 | 68252.0 | 69665.0 | 70089.0 | 70460.0 | 71091.0 | ... | 156644 | 157846 | 159603 | 160230 | 160677 | 160674 | 161414 | 162083 | 162905 | 1.479377 |
| 9 | 33605 | Tampa | Tampa-St. Petersburg-Clearwater | 61869.0 | 62487.0 | 63497.0 | 64472.0 | 65326.0 | 66166.0 | 67646.0 | ... | 144583 | 145637 | 146810 | 148029 | 149508 | 150954 | 152098 | 152722 | 153161 | 1.475569 |
| 10 | 75216 | Dallas | Dallas-Fort Worth-Arlington | 53352.0 | 53868.0 | 54211.0 | 54347.0 | 54368.0 | 54096.0 | 54092.0 | ... | 112312 | 113873 | 115341 | 117732 | 120073 | 122824 | 125318 | 127701 | 130018 | 1.436985 |
| 11 | 75226 | Dallas | Dallas-Fort Worth-Arlington | 113238.0 | 116164.0 | 119562.0 | 123550.0 | 127292.0 | 129986.0 | 131829.0 | ... | 256599 | 257564 | 261018 | 264914 | 268522 | 269731 | 270038 | 270456 | 271087 | 1.393958 |
| 12 | 30315 | Atlanta | Atlanta-Sandy Springs-Roswell | 71538.0 | 72714.0 | 73629.0 | 74431.0 | 74956.0 | 75601.0 | 76607.0 | ... | 152134 | 153950 | 155060 | 156239 | 157286 | 158660 | 160835 | 163337 | 165721 | 1.316545 |
| 13 | 43206 | Columbus | Columbus | 85523.0 | 85853.0 | 86088.0 | 86496.0 | 87244.0 | 88243.0 | 89154.0 | ... | 176182 | 177900 | 180128 | 182619 | 185234 | 187714 | 190341 | 193123 | 195342 | 1.284087 |
| 14 | 98421 | Tacoma | Seattle-Tacoma-Bellevue | 133730.0 | 134995.0 | 134850.0 | 134770.0 | 134217.0 | 135001.0 | 136125.0 | ... | 264499 | 268636 | 274340 | 279269 | 283665 | 287247 | 291513 | 298216 | 305387 | 1.283609 |
| 15 | 75217 | Dallas | Dallas-Fort Worth-Arlington | 64522.0 | 65644.0 | 66824.0 | 68230.0 | 69180.0 | 70424.0 | 71518.0 | ... | 138382 | 139289 | 140926 | 141688 | 142530 | 142410 | 143519 | 144443 | 145580 | 1.256285 |
| 16 | 75223 | Dallas | Dallas-Fort Worth-Arlington | 89484.0 | 91668.0 | 93980.0 | 97157.0 | 100680.0 | 103675.0 | 105262.0 | ... | 191050 | 192908 | 195547 | 198304 | 200625 | 201945 | 202304 | 201909 | 201469 | 1.251453 |
| 17 | 91108 | San Marino | Los Angeles-Long Beach-Anaheim | 1048090.0 | 1063476.0 | 1080509.0 | 1104028.0 | 1126157.0 | 1145866.0 | 1165616.0 | ... | 2117800 | 2144557 | 2169944 | 2195451 | 2223419 | 2253340 | 2283083 | 2318960 | 2352497 | 1.244556 |
| 18 | 33711 | Saint Petersburg | Tampa-St. Petersburg-Clearwater | 71563.0 | 71430.0 | 71870.0 | 72688.0 | 73494.0 | 74394.0 | 74934.0 | ... | 150140 | 150379 | 151034 | 151741 | 153117 | 154375 | 155797 | 157379 | 159365 | 1.226919 |
| 19 | 77011 | Houston | Houston-The Woodlands-Sugar Land | 79650.0 | 80907.0 | 82088.0 | 83163.0 | 84133.0 | 84816.0 | 85793.0 | ... | 155022 | 157283 | 160535 | 164008 | 166483 | 169004 | 170951 | 174458 | 177321 | 1.226252 |
| 20 | 75232 | Dallas | Dallas-Fort Worth-Arlington | 80585.0 | 82317.0 | 83718.0 | 85305.0 | 86562.0 | 87511.0 | 88093.0 | ... | 173650 | 174133 | 175061 | 175096 | 175629 | 175744 | 176853 | 177710 | 178540 | 1.215549 |
| 21 | 75208 | Dallas | Dallas-Fort Worth-Arlington | 147007.0 | 150366.0 | 153899.0 | 156792.0 | 158640.0 | 160409.0 | 162980.0 | ... | 301616 | 303055 | 307351 | 312162 | 316847 | 318606 | 321020 | 323218 | 325442 | 1.213786 |
| 22 | 78208 | San Antonio | San Antonio-New Braunfels | 79757.0 | 80332.0 | 80822.0 | 82013.0 | 83745.0 | 85468.0 | 87052.0 | ... | 166799 | 167070 | 167689 | 167885 | 168250 | 169095 | 171091 | 173311 | 176122 | 1.208233 |
| 23 | 48342 | Pontiac | Detroit-Warren-Dearborn | 32382.0 | 32811.0 | 33344.0 | 33532.0 | 34348.0 | 34878.0 | 35187.0 | ... | 63330 | 64004 | 64690 | 65945 | 67151 | 68186 | 69048 | 70099 | 71124 | 1.196405 |
| 24 | 75203 | Dallas | Dallas-Fort Worth-Arlington | 65039.0 | 65574.0 | 65898.0 | 65974.0 | 66261.0 | 67034.0 | 67731.0 | ... | 133441 | 134089 | 135531 | 136508 | 137643 | 138156 | 139774 | 141231 | 142834 | 1.196128 |
| 25 | 30354 | Atlanta | Atlanta-Sandy Springs-Roswell | 73178.0 | 74457.0 | 75264.0 | 76247.0 | 77194.0 | 79374.0 | 80812.0 | ... | 147883 | 149280 | 150827 | 152141 | 153322 | 154593 | 156421 | 158758 | 159847 | 1.184359 |
| 26 | 48030 | Hazel Park | Detroit-Warren-Dearborn | 55006.0 | 56026.0 | 56512.0 | 56605.0 | 56932.0 | 57398.0 | 58107.0 | ... | 110385 | 111184 | 112007 | 113280 | 114734 | 115863 | 117048 | 118151 | 120036 | 1.182235 |
| 27 | 48238 | Detroit | Detroit-Warren-Dearborn | 12081.0 | 12029.0 | 12128.0 | 12177.0 | 12070.0 | 11904.0 | 11939.0 | ... | 26332 | 26629 | 26865 | 27201 | 27647 | 27733 | 27343 | 26673 | 26228 | 1.171012 |
| 28 | 75180 | Balch Springs | Dallas-Fort Worth-Arlington | 74781.0 | 75386.0 | 75969.0 | 77069.0 | 77908.0 | 78966.0 | 79661.0 | ... | 156303 | 157088 | 158333 | 159191 | 160419 | 160611 | 161181 | 161531 | 162341 | 1.170886 |
| 29 | 90004 | Los Angeles | Los Angeles-Long Beach-Anaheim | 738130.0 | 747814.0 | 758704.0 | 770994.0 | 781514.0 | 795647.0 | 808417.0 | ... | 1466195 | 1477259 | 1491262 | 1505165 | 1521283 | 1538228 | 1557901 | 1579513 | 1599806 | 1.167377 |
30 rows × 64 columns
A geographic map visualization was created using the upper quartile ROI data. The purpose of this map is to visually identify clusters of high growth zip codes in the same metro areas. Dense clusters can be seen around Tacoma-Washington, Los Angeles-California, Dallas-Texas, Columbus-Ohio, Atlanta-Georgia, and Tampa-Florida.
# Read in and inspect the Zip Code and Geolocation Data file
zipData = pd.read_csv('US Zip Codes from 2013 Government Data.csv')
zipData = zipData.rename(columns={"ZIP": "RegionName"})
print(zipData.head(10))
print('Data Shape:', zipData.shape)
RegionName LAT LNG 0 601 18.180555 -66.749961 1 602 18.361945 -67.175597 2 603 18.455183 -67.119887 3 606 18.158345 -66.932911 4 610 18.295366 -67.125135 5 612 18.402253 -66.711397 6 616 18.420412 -66.671979 7 617 18.445147 -66.559696 8 622 17.991245 -67.153993 9 623 18.083361 -67.153897 Data Shape: (33144, 3)
dataGeoLoc = dataSubset1.merge(zipData, on='RegionName', how='left', indicator=True)
print(dataGeoLoc.head(10))
print('Data Shape:', dataGeoLoc.shape)
RegionName City Metro 4/30/2015 \ 0 43205 Columbus Columbus 59732.0 1 30314 Atlanta Atlanta-Sandy Springs-Roswell 54298.0 2 30310 Atlanta Atlanta-Sandy Springs-Roswell 78299.0 3 32208 Jacksonville Jacksonville 39348.0 4 30311 Atlanta Atlanta-Sandy Springs-Roswell 81877.0 5 30032 Candler-Mcafee Atlanta-Sandy Springs-Roswell 83564.0 6 75224 Dallas Dallas-Fort Worth-Arlington 73182.0 7 90211 Beverly Hills Los Angeles-Long Beach-Anaheim 1010992.0 8 75141 Hutchins Dallas-Fort Worth-Arlington 65704.0 9 33605 Tampa Tampa-St. Petersburg-Clearwater 61869.0 5/31/2015 6/30/2015 7/31/2015 8/31/2015 9/30/2015 10/31/2015 ... \ 0 59449.0 59635.0 60328.0 61688.0 62798.0 63811.0 ... 1 55001.0 55316.0 56172.0 56842.0 57285.0 57394.0 ... 2 79488.0 80291.0 80787.0 81228.0 82167.0 83847.0 ... 3 38848.0 38454.0 38537.0 39057.0 39365.0 39521.0 ... 4 83881.0 84995.0 85852.0 85729.0 86212.0 86945.0 ... 5 83412.0 82708.0 82401.0 82830.0 83877.0 84646.0 ... 6 74731.0 76706.0 78521.0 79636.0 80401.0 81311.0 ... 7 1029976.0 1047114.0 1065646.0 1079371.0 1097075.0 1118852.0 ... 8 66877.0 68252.0 69665.0 70089.0 70460.0 71091.0 ... 9 62487.0 63497.0 64472.0 65326.0 66166.0 67646.0 ... 10/31/2019 11/30/2019 12/31/2019 1/31/2020 2/29/2020 3/31/2020 \ 0 200347 202565 204400 205186 206203 207875 1 163443 166123 169741 174383 180050 184547 2 230116 231321 232872 235007 237397 238612 3 102537 102687 103488 104364 105567 107090 4 199317 201088 203973 206085 208123 208673 5 207828 208568 209173 209845 210278 211567 6 176359 178328 179770 181452 183072 184503 7 2329400 2354435 2384830 2423355 2485106 2544209 8 160230 160677 160674 161414 162083 162905 9 148029 149508 150954 152098 152722 153161 growth5yr LAT LNG _merge 0 2.480128 39.957019 -82.962071 both 1 2.398781 33.757576 -84.432245 both 2 2.047446 33.726586 -84.425995 both 3 1.721612 30.393090 -81.682946 both 4 1.548616 33.723262 -84.475994 both 5 1.531796 33.740666 -84.264463 both 6 1.521153 32.711405 -96.838591 both 7 1.516547 34.064958 -118.382979 both 8 1.479377 32.637333 -96.697609 both 9 1.475569 27.951122 -82.429680 both [10 rows x 67 columns] Data Shape: (6117, 67)
dataGeoLoc['text'] = dataGeoLoc['City'] + '<br>growth5yr:' + dataGeoLoc['growth5yr'].astype(str)
limits = [(0,50),(51,100),(101,150),(151,200),(201,250)]
colors = ["royalblue","crimson","lightseagreen","orange","lightgrey"]
cities = []
fig = go.Figure()
for i in range(len(limits)):
lim = limits[i]
df_sub = dataGeoLoc[lim[0]:lim[1]]
fig.add_trace(go.Scattergeo(
locationmode = 'USA-states',
lon = df_sub['LNG'],
lat = df_sub['LAT'],
text = df_sub['text'],
marker = dict(
size = df_sub['growth5yr']*50,
color = colors[i],
line_color='rgb(40,40,40)',
line_width=0.5,
sizemode = 'area'
),
name = '{0} - {1}'.format(lim[0],lim[1])))
fig.update_layout(
title_text = 'High 5-Year ROI Zip Code Locations',
showlegend = False,
geo = dict(
scope = 'usa',
landcolor = 'rgb(217, 217, 217)',
)
)
fig.show()
The property values data for four Arkansas Metro Areas was created by taking the mean of the SFR prices in the associated zip codes over time. The resulting data is a representative SFR value by metro area, one observation per metro area. Next the 24-year ROI was taken for each of the four representative observations. Fayetteville had the highest ROI at 90.6% over 24 years.
24 Year ROI (1996-2020)
A chart can be made to show the values over time. There are several observations that can made by looking at a line graph of the data. The metro area with the highest average SFR prices, Fayetteville, performed the best over time. The metro area with the lowest average SFR prices, Searcy, performed the worst over time. This trend is not entirely consistent, because Hot Springs has lower prices on average than Little Rock, but Hot Springs had slightly higher growth. Fayetteville has experienced a higher growth rate in the past five years, characterized by a steep incline in the graph.
Another interesting feature is the peak in SFR values between 2007 and 2008. There was a global economic market crash in 2008 and the downward slope between 2008 and 2012 shows the effect on SFR values for all three metro areas. Fayetteville had the steepest correction. Searcy shows a slight downward trend but remains the most level of the four metro areas.
In terms of performance, it should be noted that Fayetteville had the highest growth, but also the highest volatility. One thing the SREIT will have to determine is a risk-reward tolerance. In hindsight the best option for a 24-year investment among these four Arkansas metro areas was to invest in Fayetteville. If the timeframe for the investment was 2008 to 2012, Fayetteville would be the worst option. The final recommendation will be to invest in an area where both growth and volatility are taken into account.
arkansasSubset = data[(data["State"] == "AR")]
# Write the table to a csv
arkansasSubset.to_csv(r'arkansasSubset.csv', index = False)
print(arkansasSubset.head(10))
print('Data Shape:', arkansasSubset.shape)
RegionID SizeRank RegionName RegionType StateName State City \
331 89249 332 71913 Zip AR AR Hot Springs
663 89319 664 72034 Zip AR AR Conway
868 89707 870 72701 Zip AR AR Fayetteville
913 89749 916 72764 Zip AR AR Springdale
1230 89528 1236 72401 Zip AR AR Jonesboro
1340 58069 1347 727 Zip AR AR Walnut Ridge
1584 89745 1592 72758 Zip AR AR Rogers
1644 89743 1653 72756 Zip AR AR Rogers
1662 89709 1671 72703 Zip AR AR Fayetteville
1664 89355 1673 72076 Zip AR AR Jacksonville
Metro CountyName 1/31/1996 ... \
331 Hot Springs Garland County 91204.0 ...
663 Little Rock-North Little Rock-Conway Faulkner County 97184.0 ...
868 Fayetteville-Springdale-Rogers Washington County 92521.0 ...
913 Fayetteville-Springdale-Rogers Washington County 90921.0 ...
1230 Jonesboro Craighead County 72002.0 ...
1340 NaN Lawrence County NaN ...
1584 Fayetteville-Springdale-Rogers Benton County 134859.0 ...
1644 Fayetteville-Springdale-Rogers Benton County 101606.0 ...
1662 Fayetteville-Springdale-Rogers Washington County 113540.0 ...
1664 Little Rock-North Little Rock-Conway Pulaski County 69907.0 ...
6/30/2019 7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 \
331 158061 158528 159213 160118 161110 161806
663 182649 183101 183628 184046 184578 185073
868 224812 225642 226377 227098 227647 227829
913 171834 172676 173534 174342 174884 175082
1230 128201 128717 129317 129813 130500 131228
1340 91324 91745 91521 91633 92007 92444
1584 239480 241043 242507 243466 244308 245707
1644 177739 179187 180503 181458 182293 183489
1662 240066 240771 241262 241907 242253 242286
1664 103851 103607 103587 103564 103832 104133
12/31/2019 1/31/2020 2/29/2020 3/31/2020
331 162347 163134 164485 166238
663 185563 186034 186445 186853
868 228159 228725 229374 229775
913 175275 175506 175605 175228
1230 131973 132622 133103 133493
1340 92555 91878 91449 90295
1584 246999 248698 249534 250528
1644 184719 186236 187096 187937
1662 242405 242717 243144 243194
1664 104496 104776 105376 106351
[10 rows x 300 columns]
Data Shape: (569, 300)
# Create a subset for Hot Springs Metro Area
hotspringsARSubset = arkansasSubset[(arkansasSubset["Metro"] == "Hot Springs")]
# Limit to only the property value data
hotspringsARSubset = hotspringsARSubset.iloc[:, 9:300]
# Add a row of mean values
hsAR_mean = hotspringsARSubset.mean()
hotspringsARSubset = hotspringsARSubset.append(hsAR_mean, ignore_index=True)
print(hotspringsARSubset.head(8))
print('Data Shape:', hotspringsARSubset.shape)
1/31/1996 2/29/1996 3/31/1996 4/30/1996 5/31/1996 6/30/1996 \
0 91204.0 90258.0 90182.0 90125.0 90644.0 90944.0
1 56746.0 56160.0 56154.0 56047.0 56416.0 56735.0
2 95089.0 94846.0 95517.0 96464.0 97670.0 97257.0
3 78239.0 77565.0 78194.0 78844.0 79656.0 79362.0
4 NaN NaN NaN NaN NaN NaN
5 71456.0 70979.0 69888.0 68562.0 66946.0 66540.0
6 NaN NaN NaN NaN NaN NaN
7 78546.8 77961.6 77987.0 78008.4 78266.4 78167.6
7/31/1996 8/31/1996 9/30/1996 10/31/1996 ... 6/30/2019 \
0 90806.0 90548.0 90160.0 90116.0 ... 158061.000000
1 56913.0 57010.0 56905.0 56987.0 ... 122540.000000
2 96187.0 94575.0 94163.0 94693.0 ... 171342.000000
3 78999.0 78870.0 79142.0 79542.0 ... 160643.000000
4 NaN NaN NaN NaN ... 71682.000000
5 65988.0 65833.0 65469.0 65403.0 ... 91633.000000
6 NaN NaN NaN NaN ... 135462.000000
7 77778.6 77367.2 77167.8 77348.2 ... 130194.714286
7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 \
0 158528.000000 159213.000000 160118.000000 161110.0 161806.000000
1 122765.000000 123312.000000 124166.000000 125133.0 125741.000000
2 171551.000000 171766.000000 172953.000000 174676.0 176282.000000
3 160913.000000 160885.000000 161506.000000 162368.0 163294.000000
4 71921.000000 72636.000000 73420.000000 74118.0 75018.000000
5 90917.000000 91161.000000 92369.000000 94352.0 95600.000000
6 135293.000000 135530.000000 136114.000000 136961.0 137149.000000
7 130269.714286 130643.285714 131520.857143 132674.0 133555.714286
12/31/2019 1/31/2020 2/29/2020 3/31/2020
0 162347.000000 163134.000000 164485.000000 166238.000000
1 126252.000000 126854.000000 128023.000000 129745.000000
2 177930.000000 179789.000000 182266.000000 184162.000000
3 164107.000000 165067.000000 166484.000000 167870.000000
4 76052.000000 76815.000000 77365.000000 78301.000000
5 96312.000000 96171.000000 96568.000000 97842.000000
6 138163.000000 138781.000000 140289.000000 141090.000000
7 134451.857143 135230.142857 136497.142857 137892.571429
[8 rows x 291 columns]
Data Shape: (8, 291)
# Create a subset for Little Rock Metro Area
littlerockARSubset = arkansasSubset[(arkansasSubset["Metro"] == "Little Rock-North Little Rock-Conway")]
# Limit to only the property value data
littlerockARSubset = littlerockARSubset.iloc[:, 9:300]
# Add a row of mean values
lrAR_mean = littlerockARSubset.mean()
littlerockARSubset = littlerockARSubset.append(lrAR_mean, ignore_index=True)
print(littlerockARSubset.head(63))
print('Data Shape:', littlerockARSubset.shape)
1/31/1996 2/29/1996 3/31/1996 4/30/1996 5/31/1996 \
0 97184.000000 97271.000 97483.000000 97823.000000 98322.000000
1 69907.000000 70173.000 70579.000000 71193.000000 71619.000000
2 89345.000000 89387.000 89466.000000 89506.000000 89562.000000
3 97555.000000 97410.000 97591.000000 97667.000000 97900.000000
4 47506.000000 47928.000 48507.000000 49357.000000 50065.000000
.. ... ... ... ... ...
58 NaN NaN NaN NaN NaN
59 NaN NaN NaN NaN NaN
60 NaN NaN NaN NaN NaN
61 NaN NaN NaN NaN NaN
62 84050.083333 84245.125 84856.102041 85225.836735 85586.387755
6/30/1996 7/31/1996 8/31/1996 9/30/1996 10/31/1996 \
0 98916.000000 99686.000000 100513.000000 101273.000000 101918.000000
1 71798.000000 71923.000000 72117.000000 72295.000000 72592.000000
2 89692.000000 89974.000000 90346.000000 90667.000000 91098.000000
3 97814.000000 97901.000000 97853.000000 98086.000000 98203.000000
4 50631.000000 51218.000000 51546.000000 51622.000000 51606.000000
.. ... ... ... ... ...
58 NaN NaN NaN NaN NaN
59 NaN NaN NaN NaN NaN
60 NaN NaN NaN NaN NaN
61 NaN NaN NaN NaN NaN
62 85833.530612 86127.183673 86351.428571 86542.897959 86784.673469
... 6/30/2019 7/31/2019 8/31/2019 9/30/2019 \
0 ... 182649.000000 183101.000000 183628.000000 184046.000000
1 ... 103851.000000 103607.000000 103587.000000 103564.000000
2 ... 150961.000000 150778.000000 151006.000000 151204.000000
3 ... 156267.000000 156418.000000 156633.000000 156819.000000
4 ... 72297.000000 72162.000000 72250.000000 72332.000000
.. ... ... ... ... ...
58 ... 65388.000000 65434.000000 65445.000000 65388.000000
59 ... 59245.000000 59170.000000 58911.000000 58566.000000
60 ... 134707.000000 135348.000000 136276.000000 137688.000000
61 ... 67622.000000 67579.000000 67519.000000 67899.000000
62 ... 137957.419355 137967.725806 138232.258065 138562.548387
10/31/2019 11/30/2019 12/31/2019 1/31/2020 2/29/2020 \
0 184578.000000 185073.000000 185563.000000 186034.000000 186445.000000
1 103832.000000 104133.000000 104496.000000 104776.000000 105376.000000
2 151506.000000 151683.000000 151884.000000 152037.000000 152310.000000
3 157089.000000 157254.000000 157396.000000 157435.000000 157619.000000
4 72976.000000 73492.000000 74355.000000 75124.000000 76106.000000
.. ... ... ... ... ...
58 65387.000000 65809.000000 66591.000000 67047.000000 66832.000000
59 58861.000000 59445.000000 60058.000000 60597.000000 61254.000000
60 138926.000000 139993.000000 140913.000000 142110.000000 143216.000000
61 68879.000000 69634.000000 70435.000000 71358.000000 72478.000000
62 139092.919355 139401.790323 139585.387097 139714.419355 139935.032258
3/31/2020
0 186853.0
1 106351.0
2 153029.0
3 157867.0
4 76895.0
.. ...
58 66624.0
59 61847.0
60 144633.0
61 73570.0
62 140491.0
[63 rows x 291 columns]
Data Shape: (63, 291)
# Create a subset for Fayetteville Metro Area
fayettevilleARSubset = arkansasSubset[(arkansasSubset["Metro"] == "Fayetteville-Springdale-Rogers")]
# Limit to only the property value data
fayettevilleARSubset = fayettevilleARSubset.iloc[:, 9:300]
# Add a row of mean values
fvAR_mean = fayettevilleARSubset.mean()
fayettevilleARSubset = fayettevilleARSubset.append(fvAR_mean, ignore_index=True)
print(fayettevilleARSubset.head(41))
print('Data Shape:', fayettevilleARSubset.shape)
1/31/1996 2/29/1996 3/31/1996 4/30/1996 5/31/1996 \
0 92521.000000 92670.0 92803.0 93166.000000 93486.0
1 90921.000000 91003.0 91077.0 91203.000000 91367.0
2 134859.000000 134839.0 134875.0 134473.000000 134148.0
3 101606.000000 101357.0 101098.0 100762.000000 100498.0
4 113540.000000 113441.0 113529.0 113686.000000 113832.0
5 127404.000000 127443.0 127497.0 127638.000000 127814.0
6 108070.000000 108060.0 108082.0 108224.000000 108373.0
7 106373.000000 106346.0 106403.0 106466.000000 106504.0
8 86481.000000 86558.0 86682.0 86921.000000 87240.0
9 102499.000000 102399.0 102294.0 101757.000000 101180.0
10 103637.000000 103506.0 103434.0 103306.000000 103272.0
11 99412.000000 99677.0 99786.0 100363.000000 100892.0
12 95051.000000 95093.0 95113.0 94876.000000 94780.0
13 52535.000000 52630.0 52617.0 52594.000000 52493.0
14 75015.000000 75152.0 75327.0 75736.000000 76191.0
15 91725.000000 91568.0 91630.0 91686.000000 91762.0
16 83361.000000 83385.0 83540.0 83919.000000 84460.0
17 75583.000000 75622.0 75624.0 75858.000000 76095.0
18 81423.000000 81463.0 81621.0 81889.000000 82137.0
19 72612.000000 72556.0 72716.0 73260.000000 74289.0
20 78102.000000 78007.0 77886.0 77615.000000 77472.0
21 79391.000000 79713.0 80005.0 80560.000000 80899.0
22 138674.000000 139153.0 139317.0 140223.000000 141277.0
23 146225.000000 146616.0 147046.0 147869.000000 148640.0
24 84103.000000 84235.0 84622.0 84894.000000 85055.0
25 67668.000000 67568.0 67623.0 67739.000000 68115.0
26 72259.000000 71990.0 71922.0 71663.000000 71679.0
27 NaN NaN NaN NaN NaN
28 62438.000000 62489.0 62628.0 62494.000000 62193.0
29 NaN NaN NaN NaN NaN
30 NaN NaN NaN NaN NaN
31 NaN NaN NaN NaN NaN
32 100283.000000 100235.0 100413.0 100664.000000 101555.0
33 NaN NaN NaN NaN NaN
34 NaN NaN NaN NaN NaN
35 NaN 66747.0 66633.0 66708.000000 66975.0
36 NaN NaN NaN NaN NaN
37 NaN NaN NaN NaN NaN
38 NaN NaN NaN NaN NaN
39 NaN NaN NaN NaN NaN
40 93923.137931 93050.7 93128.1 93273.733333 93489.1
6/30/1996 7/31/1996 8/31/1996 9/30/1996 10/31/1996 \
0 93901.000000 94285.000000 94713.000000 95074.000000 95441.0
1 91523.000000 91801.000000 91980.000000 92221.000000 92345.0
2 134260.000000 135048.000000 136131.000000 136895.000000 138310.0
3 100575.000000 100659.000000 100841.000000 100768.000000 100250.0
4 113973.000000 114142.000000 114375.000000 114565.000000 114712.0
5 128466.000000 128951.000000 129630.000000 129769.000000 130205.0
6 108595.000000 108861.000000 109081.000000 109305.000000 109432.0
7 106519.000000 106573.000000 106655.000000 106784.000000 106882.0
8 87480.000000 87693.000000 87878.000000 87997.000000 87890.0
9 101208.000000 101721.000000 102348.000000 102684.000000 102271.0
10 103425.000000 103691.000000 104034.000000 104351.000000 104163.0
11 101624.000000 102044.000000 102712.000000 103297.000000 103957.0
12 94780.000000 95126.000000 95460.000000 95726.000000 96400.0
13 52378.000000 52287.000000 52318.000000 52284.000000 52384.0
14 76737.000000 77243.000000 77729.000000 78170.000000 78584.0
15 91861.000000 92000.000000 92199.000000 92338.000000 92412.0
16 85081.000000 85596.000000 86009.000000 86226.000000 86913.0
17 76633.000000 77058.000000 77650.000000 77983.000000 78142.0
18 82395.000000 82764.000000 83183.000000 83564.000000 84023.0
19 75583.000000 76598.000000 77389.000000 78185.000000 78667.0
20 77589.000000 77952.000000 78106.000000 78291.000000 78335.0
21 81037.000000 81061.000000 81205.000000 81512.000000 81888.0
22 142942.000000 145085.000000 146652.000000 146873.000000 145596.0
23 149488.000000 150033.000000 150797.000000 151423.000000 152463.0
24 85056.000000 85459.000000 86212.000000 86920.000000 87513.0
25 68510.000000 68988.000000 69539.000000 69893.000000 69834.0
26 71664.000000 71606.000000 71567.000000 71736.000000 72041.0
27 NaN NaN NaN NaN NaN
28 61585.000000 61270.000000 61036.000000 61126.000000 61266.0
29 NaN NaN NaN NaN NaN
30 NaN NaN NaN NaN NaN
31 NaN NaN NaN NaN NaN
32 102581.000000 103624.000000 104226.000000 105031.000000 105648.0
33 NaN NaN NaN NaN NaN
34 NaN NaN NaN NaN NaN
35 67381.000000 67461.000000 67868.000000 68195.000000 69452.0
36 NaN NaN NaN NaN NaN
37 NaN NaN NaN NaN NaN
38 NaN NaN NaN NaN NaN
39 NaN NaN NaN NaN NaN
40 93827.666667 94222.666667 94650.766667 94972.866667 95247.3
... 6/30/2019 7/31/2019 8/31/2019 9/30/2019 10/31/2019 \
0 ... 224812.000 225642.000 226377.000 227098.00 227647.0
1 ... 171834.000 172676.000 173534.000 174342.00 174884.0
2 ... 239480.000 241043.000 242507.000 243466.00 244308.0
3 ... 177739.000 179187.000 180503.000 181458.00 182293.0
4 ... 240066.000 240771.000 241262.000 241907.00 242253.0
5 ... 242735.000 244402.000 245966.000 247175.00 248322.0
6 ... 203174.000 203904.000 204672.000 205394.00 205941.0
7 ... 213128.000 213501.000 213957.000 214498.00 214830.0
8 ... 161480.000 162948.000 164202.000 165131.00 165778.0
9 ... 184896.000 186348.000 187620.000 188304.00 188731.0
10 ... 181162.000 182680.000 184105.000 184984.00 185651.0
11 ... 187588.000 188653.000 189687.000 190450.00 190961.0
12 ... 187308.000 188807.000 190069.000 191136.00 191872.0
13 ... 111307.000 111698.000 112204.000 112517.00 113032.0
14 ... 175394.000 175824.000 176293.000 176761.00 177476.0
15 ... 193247.000 193683.000 194157.000 194670.00 195120.0
16 ... 166801.000 167979.000 169092.000 169900.00 170463.0
17 ... 163074.000 165259.000 167012.000 168378.00 169196.0
18 ... 163406.000 164446.000 165791.000 166464.00 167514.0
19 ... 173763.000 176462.000 178608.000 180147.00 181025.0
20 ... 149585.000 150196.000 151284.000 152196.00 153150.0
21 ... 130596.000 131858.000 133789.000 135982.00 137899.0
22 ... 224158.000 225913.000 228112.000 229928.00 231621.0
23 ... 287461.000 289308.000 291002.000 291870.00 292492.0
24 ... 146784.000 148444.000 149676.000 149048.00 149451.0
25 ... 142518.000 145195.000 147431.000 149120.00 150296.0
26 ... 176718.000 177805.000 178963.000 179876.00 180926.0
27 ... 129105.000 133155.000 136750.000 139632.00 141444.0
28 ... 132478.000 133619.000 134719.000 135564.00 136495.0
29 ... 175343.000 175345.000 177286.000 180543.00 183479.0
30 ... 126321.000 128058.000 129708.000 131528.00 133494.0
31 ... 103624.000 104254.000 105007.000 105394.00 105573.0
32 ... 186479.000 189338.000 191763.000 193769.00 194327.0
33 ... 217082.000 216960.000 217273.000 217898.00 219849.0
34 ... 92910.000 93923.000 95137.000 96032.00 96272.0
35 ... 106536.000 107163.000 108089.000 108504.00 108835.0
36 ... 70560.000 71065.000 71603.000 71937.00 71957.0
37 ... 120245.000 120998.000 122403.000 124719.00 126608.0
38 ... 218225.000 219951.000 221901.000 224219.00 226889.0
39 ... 136689.000 137160.000 137861.000 139055.00 140006.0
40 ... 170895.275 172140.525 173434.375 174524.85 175459.0
11/30/2019 12/31/2019 1/31/2020 2/29/2020 3/31/2020
0 227829.000 228159.0 228725.00 229374.0 229775.00
1 175082.000 175275.0 175506.00 175605.0 175228.00
2 245707.000 246999.0 248698.00 249534.0 250528.00
3 183489.000 184719.0 186236.00 187096.0 187937.00
4 242286.000 242405.0 242717.00 243144.0 243194.00
5 249979.000 251455.0 253174.00 253985.0 254993.00
6 206086.000 206375.0 206782.00 207250.0 207241.00
7 214506.000 214157.0 213985.00 214004.0 213482.00
8 166705.000 167679.0 169162.00 170009.0 170817.00
9 189577.000 190541.0 191884.00 192615.0 193385.00
10 186801.000 187930.0 189464.00 190298.0 191346.00
11 191900.000 192692.0 193969.00 194573.0 195412.00
12 193122.000 194230.0 195736.00 196546.0 197558.00
13 113494.000 114065.0 114389.00 114968.0 115393.00
14 177445.000 177388.0 177198.00 177372.0 177041.00
15 195015.000 194856.0 194662.00 194585.0 193986.00
16 171435.000 172322.0 173645.00 174531.0 175484.00
17 170415.000 171436.0 172984.00 173741.0 174521.00
18 167928.000 168559.0 168714.00 169000.0 168770.00
19 182155.000 182844.0 184212.00 184953.0 185789.00
20 153604.000 154137.0 154326.00 154400.0 154013.00
21 138915.000 139299.0 139738.00 140705.0 140751.00
22 233449.000 235187.0 237865.00 239784.0 242029.00
23 293801.000 295166.0 296972.00 297592.0 298333.00
24 149560.000 151172.0 151840.00 152321.0 151349.00
25 151887.000 153113.0 154697.00 155483.0 156622.00
26 181788.000 182617.0 182996.00 183073.0 182739.00
27 142673.000 143531.0 144775.00 144971.0 145318.00
28 137091.000 137640.0 137781.00 138222.0 138204.00
29 186487.000 187914.0 190646.00 193359.0 197705.00
30 135723.000 136491.0 137293.00 138450.0 140531.00
31 105727.000 106134.0 106673.00 107779.0 108437.00
32 194703.000 194358.0 195764.00 196689.0 198077.00
33 221599.000 222443.0 220810.00 219407.0 218065.00
34 96059.000 96367.0 97077.00 98272.0 98975.00
35 108795.000 108817.0 108996.00 109122.0 108744.00
36 71996.000 72251.0 72515.00 72893.0 72824.00
37 128058.000 129182.0 131680.00 135279.0 138908.00
38 228258.000 228305.0 228683.00 228058.0 225814.00
39 140154.000 139586.0 139137.00 139678.0 139836.00
40 176282.075 176944.9 177802.65 178468.0 178978.85
[41 rows x 291 columns]
Data Shape: (41, 291)
# Create a subset for Searcy Metro Area
searcyARSubset = arkansasSubset[(arkansasSubset["Metro"] == "Searcy")]
# Limit to only the property value data
searcyARSubset = searcyARSubset.iloc[:, 9:300]
# Add a row of mean values
seAR_mean = searcyARSubset.mean()
searcyARSubset = searcyARSubset.append(seAR_mean, ignore_index=True)
print(searcyARSubset.head(19))
print('Data Shape:', searcyARSubset.shape)
1/31/1996 2/29/1996 3/31/1996 4/30/1996 5/31/1996 6/30/1996 \
0 75061.000000 75016.0 75024.000000 75013.0 75034.00 75071.0
1 71933.000000 71940.0 71984.000000 72025.0 72060.00 72094.0
2 56956.000000 56882.0 56823.000000 56730.0 56674.00 56576.0
3 54860.000000 54759.0 54764.000000 54726.0 54664.00 54446.0
4 46080.000000 45962.0 45902.000000 45761.0 45690.00 45703.0
5 68025.000000 67847.0 67947.000000 68109.0 68415.00 68599.0
6 52283.000000 52293.0 52377.000000 52481.0 52627.00 52836.0
7 57515.000000 57368.0 57523.000000 57683.0 58020.00 58182.0
8 NaN NaN NaN NaN NaN NaN
9 79268.000000 79091.0 79081.000000 79152.0 79294.00 79505.0
10 NaN NaN NaN NaN NaN NaN
11 44335.000000 44127.0 44052.000000 43988.0 43937.00 43894.0
12 53490.000000 53322.0 53280.000000 53157.0 53110.00 53158.0
13 21463.000000 21385.0 21387.000000 21305.0 21280.00 21212.0
14 NaN NaN NaN NaN NaN NaN
15 NaN NaN NaN NaN NaN NaN
16 NaN NaN NaN NaN NaN NaN
17 NaN NaN NaN NaN NaN NaN
18 56772.416667 56666.0 56678.666667 56677.5 56733.75 56773.0
7/31/1996 8/31/1996 9/30/1996 10/31/1996 ... 6/30/2019 \
0 75144.0 75244.0 75365.00 75516.0 ... 142660.000000
1 72113.0 72168.0 72233.00 72311.0 ... 133879.000000
2 56499.0 56528.0 56612.00 56674.0 ... 101268.000000
3 54183.0 54166.0 54306.00 54543.0 ... 87027.000000
4 45721.0 45916.0 46052.00 46291.0 ... 64361.000000
5 68589.0 68697.0 68816.00 69154.0 ... 121228.000000
6 53002.0 53143.0 53175.00 53277.0 ... 98553.000000
7 58271.0 58347.0 58488.00 58794.0 ... 124516.000000
8 NaN NaN NaN NaN ... 47443.000000
9 79479.0 79434.0 79202.00 79050.0 ... 128104.000000
10 NaN NaN NaN NaN ... 152451.000000
11 43763.0 43709.0 43672.00 43685.0 ... 72533.000000
12 53212.0 53384.0 53363.00 53463.0 ... 74266.000000
13 21234.0 21428.0 21555.00 21638.0 ... 45572.000000
14 NaN NaN NaN NaN ... 57991.000000
15 NaN NaN NaN NaN ... 31232.000000
16 NaN NaN NaN NaN ... 57306.000000
17 NaN NaN NaN NaN ... 41994.000000
18 56767.5 56847.0 56903.25 57033.0 ... 87910.222222
7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 \
0 142838.000000 142174.000000 141541.000000 140831.000000 140000.000000
1 134108.000000 133513.000000 133045.000000 132452.000000 131770.000000
2 101367.000000 100903.000000 100459.000000 99975.000000 99351.000000
3 87211.000000 86964.000000 86678.000000 86224.000000 85744.000000
4 64491.000000 64560.000000 64635.000000 64761.000000 64854.000000
5 121333.000000 120912.000000 120495.000000 120215.000000 119828.000000
6 98838.000000 98521.000000 98301.000000 97896.000000 97439.000000
7 124928.000000 124718.000000 124573.000000 124302.000000 123735.000000
8 47677.000000 47656.000000 47670.000000 47657.000000 47507.000000
9 128302.000000 127762.000000 127263.000000 126940.000000 126493.000000
10 153003.000000 152383.000000 151921.000000 151460.000000 151178.000000
11 72464.000000 72474.000000 72403.000000 72078.000000 72021.000000
12 74387.000000 74092.000000 73889.000000 73709.000000 73507.000000
13 45514.000000 45433.000000 45545.000000 45402.000000 45255.000000
14 58111.000000 58447.000000 58804.000000 59618.000000 59790.000000
15 31345.000000 31579.000000 32044.000000 32267.000000 32244.000000
16 57472.000000 57839.000000 58188.000000 58752.000000 59194.000000
17 42054.000000 42106.000000 41858.000000 41665.000000 41452.000000
18 88080.166667 87890.888889 87739.555556 87566.888889 87297.888889
12/31/2019 1/31/2020 2/29/2020 3/31/2020
0 138321.000000 137024.000000 136435.000000 136386.000000
1 130253.000000 129146.000000 128691.000000 128663.000000
2 98085.000000 97242.000000 96910.000000 96980.000000
3 84888.000000 84213.000000 83769.000000 83471.000000
4 64715.000000 64680.000000 64703.000000 64966.000000
5 118787.000000 117834.000000 117513.000000 117597.000000
6 96466.000000 95898.000000 95826.000000 95984.000000
7 122591.000000 121884.000000 121781.000000 121875.000000
8 47143.000000 46872.000000 46779.000000 46706.000000
9 125263.000000 124385.000000 124180.000000 124430.000000
10 149800.000000 148768.000000 148509.000000 148897.000000
11 71946.000000 72218.000000 72291.000000 72383.000000
12 72935.000000 72487.000000 72360.000000 72460.000000
13 45029.000000 44953.000000 45215.000000 45295.000000
14 60082.000000 59911.000000 60450.000000 60440.000000
15 31984.000000 31919.000000 31984.000000 32203.000000
16 59691.000000 60046.000000 60681.000000 61563.000000
17 41032.000000 40543.000000 40114.000000 39782.000000
18 86611.722222 86112.388889 86010.611111 86115.611111
[19 rows x 291 columns]
Data Shape: (19, 291)
AR_mean = hotspringsARSubset.append(littlerockARSubset.iloc[62,:])
AR_mean = AR_mean.append(fayettevilleARSubset.iloc[40,:])
AR_mean = AR_mean.append(searcyARSubset.iloc[18,:])
AR_mean = AR_mean.iloc[7:62,:]
AR_mean.reset_index(drop=True, inplace=True)
AR_metro = ["hot_springs", "little_rock", "fayetteville", "searcy"]
AR_mean.insert(loc=0, column='Metro', value=AR_metro)
print(AR_mean.head(19))
print('Data Shape:', AR_mean.shape)
Metro 1/31/1996 2/29/1996 3/31/1996 4/30/1996 \
0 hot_springs 78546.800000 77961.600 77987.000000 78008.400000
1 little_rock 84050.083333 84245.125 84856.102041 85225.836735
2 fayetteville 93923.137931 93050.700 93128.100000 93273.733333
3 searcy 56772.416667 56666.000 56678.666667 56677.500000
5/31/1996 6/30/1996 7/31/1996 8/31/1996 9/30/1996 ... \
0 78266.400000 78167.600000 77778.600000 77367.200000 77167.800000 ...
1 85586.387755 85833.530612 86127.183673 86351.428571 86542.897959 ...
2 93489.100000 93827.666667 94222.666667 94650.766667 94972.866667 ...
3 56733.750000 56773.000000 56767.500000 56847.000000 56903.250000 ...
6/30/2019 7/31/2019 8/31/2019 9/30/2019 10/31/2019 \
0 130194.714286 130269.714286 130643.285714 131520.857143 132674.000000
1 137957.419355 137967.725806 138232.258065 138562.548387 139092.919355
2 170895.275000 172140.525000 173434.375000 174524.850000 175459.000000
3 87910.222222 88080.166667 87890.888889 87739.555556 87566.888889
11/30/2019 12/31/2019 1/31/2020 2/29/2020 3/31/2020
0 133555.714286 134451.857143 135230.142857 136497.142857 137892.571429
1 139401.790323 139585.387097 139714.419355 139935.032258 140491.000000
2 176282.075000 176944.900000 177802.650000 178468.000000 178978.850000
3 87297.888889 86611.722222 86112.388889 86010.611111 86115.611111
[4 rows x 292 columns]
Data Shape: (4, 292)
# Transpose the dataframe
ARmean_transposed = AR_mean.T
ARmean_transposed = ARmean_transposed.reset_index()
ARmean_transposed = ARmean_transposed.rename(columns=ARmean_transposed.iloc[0]).drop(ARmean_transposed.index[0])
ARmean_transposed = ARmean_transposed.rename(columns={"Metro": "date"})
ARmean_transposed['date'] = pd.DatetimeIndex(ARmean_transposed['date'])
ARmean_transposed['hot_springs'] = pd.to_numeric(ARmean_transposed['hot_springs'])
ARmean_transposed['little_rock'] = pd.to_numeric(ARmean_transposed['little_rock'])
ARmean_transposed['fayetteville'] = pd.to_numeric(ARmean_transposed['fayetteville'])
ARmean_transposed['searcy'] = pd.to_numeric(ARmean_transposed['searcy'])
print("ARmean_transposed Dataframe : ")
print(ARmean_transposed)
print(ARmean_transposed.dtypes)
ARmean_transposed Dataframe :
date hot_springs little_rock fayetteville searcy
1 1996-01-31 78546.800000 84050.083333 93923.137931 56772.416667
2 1996-02-29 77961.600000 84245.125000 93050.700000 56666.000000
3 1996-03-31 77987.000000 84856.102041 93128.100000 56678.666667
4 1996-04-30 78008.400000 85225.836735 93273.733333 56677.500000
5 1996-05-31 78266.400000 85586.387755 93489.100000 56733.750000
.. ... ... ... ... ...
287 2019-11-30 133555.714286 139401.790323 176282.075000 87297.888889
288 2019-12-31 134451.857143 139585.387097 176944.900000 86611.722222
289 2020-01-31 135230.142857 139714.419355 177802.650000 86112.388889
290 2020-02-29 136497.142857 139935.032258 178468.000000 86010.611111
291 2020-03-31 137892.571429 140491.000000 178978.850000 86115.611111
[291 rows x 5 columns]
date datetime64[ns]
hot_springs float64
little_rock float64
fayetteville float64
searcy float64
dtype: object
AR_roi = ((AR_mean['3/31/2020'] - AR_mean['1/31/1996']) / AR_mean['1/31/1996'])
print("24 Year ROI (1996-2020)")
print("Hot Springs:", AR_roi[0])
print("Little Rock:", AR_roi[1])
print("Fayetteville:", AR_roi[2])
print("Searcy:", AR_roi[3])
24 Year ROI (1996-2020) Hot Springs: 0.7555466477128465 Little Rock: 0.6715152969310958 Fayetteville: 0.9055884837601987 Searcy: 0.516856532930947
fig, ax = plt.subplots(figsize=(12, 10))
sns.set_style('darkgrid')
sns.lineplot(x="date", y="hot_springs", label = "Hot Springs", data=ARmean_transposed)
sns.lineplot(x="date", y="little_rock", label = "Little Rock", data=ARmean_transposed)
sns.lineplot(x="date", y="fayetteville", label = "Fayetteville", data=ARmean_transposed)
sns.lineplot(x="date", y="searcy", label = "Searcy", data=ARmean_transposed)
plt.title('Arkansas Home Values')
ax.set(xlabel= None , ylabel= None)
plt.show()
Based on the initial analysis of 5-year growth rates and map showing high growth clusters, Tampa-Florida, Atlanta-Georgia, and Dallas-Texas have emerged as very promising options for investment properties.
Following a similar procedure described above for the analysis of Arkansas metro areas. The metro areas of our three target cities were compared. Tampa had the had the highest ROI at 172.1% over 24 years. Atlanta and Dallas were similar in ROI, 106.3% and 105.6% respectively.
24 Year ROI (1996-2020)
Using the fb Prophet Forecaster tool developed by Facebook the timeseries data for each metro area can be submitted as the input for a forecast model of SFR values for the next three years. The growth for all three metro areas are comparable, approximately 16% to 20% over three years from 2020-2023. Dallas has the highest projected ROI of 20.1%. Tampa has the next highest, 18.8%. Atlanta has the lowest ROI of 16.7%. The other feature for the forecasts was a 95% confidence interval. Tampa had quite a wide margin for error, which included a lower boundary illustrating a possible loss. The Atlanta forecast had a narrower margin of error, where the worst case scenario shows a plateau in 2022 and 2023. Dallas had the tightest margin of error that only showed growth within the 95% confidence interval. Given the extremely high ROI in the Dallas Metro area and the low volatility, Dallas is the most promising metro area for SREIT’s investment activities.
# Create subsets for the most promising metro areas
atlantaSubset = data[(data["Metro"] == "Atlanta-Sandy Springs-Roswell")]
dallasSubset = data[(data["Metro"] == "Dallas-Fort Worth-Arlington")]
tampaSubset = data[(data["Metro"] == "Tampa-St. Petersburg-Clearwater")]
# Write the table to a csv
atlantaSubset.to_csv(r'atlantaSubset.csv', index = False)
dallasSubset.to_csv(r'dallasSubset.csv', index = False)
tampaSubset.to_csv(r'tampaSubset.csv', index = False)
print(atlantaSubset.head(5))
print('Data Shape:', atlantaSubset.shape)
print(dallasSubset.head(5))
print('Data Shape:', dallasSubset.shape)
print(tampaSubset.head(5))
print('Data Shape:', tampaSubset.shape)
RegionID SizeRank RegionName RegionType StateName State City \
37 71067 37 30349 Zip GA GA Riverdale
52 70829 52 30044 Zip GA GA Lawrenceville
58 70828 58 30043 Zip GA GA Lawrenceville
65 70874 65 30096 Zip GA GA Duluth
69 70810 69 30024 Zip GA GA Suwanee
Metro CountyName 1/31/1996 ... 6/30/2019 \
37 Atlanta-Sandy Springs-Roswell Clayton County 94712.0 ... 155352
52 Atlanta-Sandy Springs-Roswell Gwinnett County 117878.0 ... 214618
58 Atlanta-Sandy Springs-Roswell Gwinnett County 144393.0 ... 250522
65 Atlanta-Sandy Springs-Roswell Gwinnett County 132737.0 ... 251000
69 Atlanta-Sandy Springs-Roswell Gwinnett County 191449.0 ... 374618
7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 12/31/2019 \
37 156398 157617 158767 159567 160335 161267
52 215612 216716 217763 218565 219747 221045
58 251286 252229 253167 253795 254686 255603
65 252078 253332 254363 255063 256239 257468
69 375162 375868 376408 376643 377282 377977
1/31/2020 2/29/2020 3/31/2020
37 162426 163854 164941
52 222308 223252 224221
58 256600 257463 258359
65 258732 259524 260375
69 378646 378626 378761
[5 rows x 300 columns]
Data Shape: (208, 300)
RegionID SizeRank RegionName RegionType StateName State City \
39 90654 39 75052 Zip TX TX Grand Prairie
75 90823 75 75287 Zip TX TX Dallas
84 90795 84 75243 Zip TX TX Dallas
98 90665 98 75067 Zip TX TX Lewisville
137 91221 137 76063 Zip TX TX Mansfield
Metro CountyName 1/31/1996 ... 6/30/2019 \
39 Dallas-Fort Worth-Arlington Dallas County 97847.0 ... 225372
75 Dallas-Fort Worth-Arlington Dallas County 186128.0 ... 344110
84 Dallas-Fort Worth-Arlington Dallas County 135626.0 ... 306070
98 Dallas-Fort Worth-Arlington Denton County 112396.0 ... 256783
137 Dallas-Fort Worth-Arlington Tarrant County 142524.0 ... 290855
7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 12/31/2019 \
39 226092 226994 228161 229165 229953 230406
75 344511 344939 345412 345372 345526 345519
84 306334 306498 307570 307414 307389 305673
98 257660 258235 258624 258326 258334 258406
137 291373 292134 292932 293678 293920 294143
1/31/2020 2/29/2020 3/31/2020
39 230866 231422 232103
75 345882 346390 347554
84 304756 303624 302736
98 258587 259272 260575
137 294218 294876 295757
[5 rows x 300 columns]
Data Shape: (268, 300)
RegionID SizeRank RegionName RegionType StateName State City \
167 72737 168 33647 Zip FL FL Tampa
475 72653 476 33511 Zip FL FL Brandon
505 399576 506 33578 Zip FL FL Riverview
621 73162 622 34668 Zip FL FL Port Richey
697 73181 698 34698 Zip FL FL Dunedin
Metro CountyName 1/31/1996 ... \
167 Tampa-St. Petersburg-Clearwater Hillsborough County NaN ...
475 Tampa-St. Petersburg-Clearwater Hillsborough County 95029.0 ...
505 Tampa-St. Petersburg-Clearwater Hillsborough County 93146.0 ...
621 Tampa-St. Petersburg-Clearwater Pasco County 58101.0 ...
697 Tampa-St. Petersburg-Clearwater Pinellas County 95513.0 ...
6/30/2019 7/31/2019 8/31/2019 9/30/2019 10/31/2019 11/30/2019 \
167 316281 317079 317332 317358 317677 318227
475 232302 233269 233905 234404 234981 235531
505 222895 223603 224062 224411 224914 225376
621 125746 126699 127598 128501 129221 130245
697 288328 289414 290194 291477 292720 294634
12/31/2019 1/31/2020 2/29/2020 3/31/2020
167 318998 319526 320429 321184
475 236086 236610 237360 238124
505 225952 226277 226875 227469
621 131210 132202 133474 135385
697 296115 297536 299875 302183
[5 rows x 300 columns]
Data Shape: (130, 300)
# Limit to only the property value data
atlantaSubset = atlantaSubset.iloc[:, 9:300]
dallasSubset = dallasSubset.iloc[:, 9:300]
tampaSubset = tampaSubset.iloc[:, 9:300]
# Add a row of mean values
atl_mean = atlantaSubset.mean()
atlantaSubset = atlantaSubset.append(atl_mean, ignore_index=True)
dal_mean = dallasSubset.mean()
dallasSubset = dallasSubset.append(dal_mean, ignore_index=True)
tam_mean = tampaSubset.mean()
tampaSubset = tampaSubset.append(tam_mean, ignore_index=True)
print('Atlanta Data Shape:', atlantaSubset.shape)
print('Dallas Data Shape:', dallasSubset.shape)
print('Tampa Data Shape:', tampaSubset.shape)
Atlanta Data Shape: (209, 291) Dallas Data Shape: (269, 291) Tampa Data Shape: (131, 291)
# Format a new dataframe of mean values
Metro_mean = tampaSubset.append(atlantaSubset.iloc[208,:])
Metro_mean = Metro_mean.append(dallasSubset.iloc[268,:])
Metro_mean = Metro_mean.iloc[130:268,:]
Metro_mean.reset_index(drop=True, inplace=True)
# Insert a column to identify metro areas
Metro_names = ["tampa", "atlanta", "dallas"]
Metro_mean.insert(loc=0, column='Metro', value=Metro_names)
print(Metro_mean)
print('Data Shape:', Metro_mean.shape)
Metro 1/31/1996 2/29/1996 3/31/1996 4/30/1996 \
0 tampa 101949.484375 101725.403101 101698.054264 101689.542636
1 atlanta 127873.815029 128239.505747 128240.640000 128737.680000
2 dallas 133135.096447 135648.156566 135824.151515 136092.287879
5/31/1996 6/30/1996 7/31/1996 8/31/1996 9/30/1996 \
0 101761.542636 101902.255814 101997.434109 102089.317829 102130.790698
1 129217.320000 129745.245714 130241.702857 130746.200000 131243.131429
2 136431.893939 136618.166667 136725.833333 136993.237374 137321.540404
... 6/30/2019 7/31/2019 8/31/2019 9/30/2019 \
0 ... 266077.246154 267063.423077 267805.838462 268678.446154
1 ... 256410.163462 257091.442308 257914.182692 258661.884615
2 ... 268202.451493 268980.264925 269790.832090 270898.768657
10/31/2019 11/30/2019 12/31/2019 1/31/2020 2/29/2020 \
0 269583.053846 270773.215385 271906.000000 273271.184615 275205.961538
1 259321.639423 259892.788462 260660.177885 261631.778846 262761.975962
2 271486.798507 271985.376866 272088.276119 272508.208955 272990.757463
3/31/2020
0 277373.700000
1 263753.173077
2 273692.216418
[3 rows x 292 columns]
Data Shape: (3, 292)
# Transpose the dataframe
Metro_mean_transposed = Metro_mean.T
Metro_mean_transposed = Metro_mean_transposed.reset_index()
Metro_mean_transposed = Metro_mean_transposed.rename(columns=Metro_mean_transposed.iloc[0]).drop(Metro_mean_transposed.index[0])
Metro_mean_transposed = Metro_mean_transposed.rename(columns={"Metro": "date"})
Metro_mean_transposed['date'] = pd.DatetimeIndex(Metro_mean_transposed['date'])
Metro_mean_transposed['tampa'] = pd.to_numeric(Metro_mean_transposed['tampa'])
Metro_mean_transposed['atlanta'] = pd.to_numeric(Metro_mean_transposed['atlanta'])
Metro_mean_transposed['dallas'] = pd.to_numeric(Metro_mean_transposed['dallas'])
print("Metro_mean_transposed Dataframe : ")
print(Metro_mean_transposed)
print(Metro_mean_transposed.dtypes)
Metro_mean_transposed Dataframe :
date tampa atlanta dallas
1 1996-01-31 101949.484375 127873.815029 133135.096447
2 1996-02-29 101725.403101 128239.505747 135648.156566
3 1996-03-31 101698.054264 128240.640000 135824.151515
4 1996-04-30 101689.542636 128737.680000 136092.287879
5 1996-05-31 101761.542636 129217.320000 136431.893939
.. ... ... ... ...
287 2019-11-30 270773.215385 259892.788462 271985.376866
288 2019-12-31 271906.000000 260660.177885 272088.276119
289 2020-01-31 273271.184615 261631.778846 272508.208955
290 2020-02-29 275205.961538 262761.975962 272990.757463
291 2020-03-31 277373.700000 263753.173077 273692.216418
[291 rows x 4 columns]
date datetime64[ns]
tampa float64
atlanta float64
dallas float64
dtype: object
# Calculate 24-Year ROI
Metro_roi = ((Metro_mean['3/31/2020'] - Metro_mean['1/31/1996']) / Metro_mean['1/31/1996'])
print("24 Year ROI (1996-2020)")
print("Tampa:", Metro_roi[0])
print("Atlanta:", Metro_roi[1])
print("Dallas:", Metro_roi[2])
24 Year ROI (1996-2020) Tampa: 1.7206974287357695 Atlanta: 1.0626050221252115 Dallas: 1.0557480613497043
fig, ax = plt.subplots(figsize=(12, 10))
sns.set_style('darkgrid')
sns.lineplot(x="date", y="tampa", label = "Tampa", data=Metro_mean_transposed)
sns.lineplot(x="date", y="atlanta", label = "Atlanta", data=Metro_mean_transposed)
sns.lineplot(x="date", y="dallas", label = "Dallas", data=Metro_mean_transposed)
plt.title('Home Values')
ax.set(xlabel= None , ylabel= None)
plt.show()
tampa = Metro_mean_transposed[["date", "tampa"]]
atlanta = Metro_mean_transposed[["date", "atlanta"]]
dallas = Metro_mean_transposed[["date", "dallas"]]
tampa.columns = ['ds', 'y']
atlanta.columns = ['ds', 'y']
dallas.columns = ['ds', 'y']
print("Tampa Time Series\n")
print(tampa.head())
print("\nTampa Time Series Data Types:\n", tampa.dtypes)
print("\nAtlanta Time Series\n")
print(atlanta.head())
print("\nAtlanta Time Series Data Types:\n", atlanta.dtypes)
print("\nDallas Time Series\n")
print(dallas.head())
print("\nDallas Time Series Data Types:\n", dallas.dtypes)
Tampa Time Series
ds y
1 1996-01-31 101949.484375
2 1996-02-29 101725.403101
3 1996-03-31 101698.054264
4 1996-04-30 101689.542636
5 1996-05-31 101761.542636
Tampa Time Series Data Types:
ds datetime64[ns]
y float64
dtype: object
Atlanta Time Series
ds y
1 1996-01-31 127873.815029
2 1996-02-29 128239.505747
3 1996-03-31 128240.640000
4 1996-04-30 128737.680000
5 1996-05-31 129217.320000
Atlanta Time Series Data Types:
ds datetime64[ns]
y float64
dtype: object
Dallas Time Series
ds y
1 1996-01-31 133135.096447
2 1996-02-29 135648.156566
3 1996-03-31 135824.151515
4 1996-04-30 136092.287879
5 1996-05-31 136431.893939
Dallas Time Series Data Types:
ds datetime64[ns]
y float64
dtype: object
# set the uncertainty interval to 95% (the Prophet default is 80%)
tampa_model = Prophet(interval_width=0.95, weekly_seasonality=True, daily_seasonality=True)
atlanta_model = Prophet(interval_width=0.95, weekly_seasonality=True, daily_seasonality=True)
dallas_model = Prophet(interval_width=0.95, weekly_seasonality=True, daily_seasonality=True)
print(tampa_model.fit(tampa))
print(atlanta_model.fit(atlanta))
print(dallas_model.fit(dallas))
<fbprophet.forecaster.Prophet object at 0x0000022B7D6C9548> <fbprophet.forecaster.Prophet object at 0x0000022B7D6C9388> <fbprophet.forecaster.Prophet object at 0x0000022B72D4DE88>
tampa_future_dates = tampa_model.make_future_dataframe(periods=36, freq='MS')
atlanta_future_dates = atlanta_model.make_future_dataframe(periods=36, freq='MS')
dallas_future_dates = dallas_model.make_future_dataframe(periods=36, freq='MS')
tampa_forecast = tampa_model.predict(tampa_future_dates)
atlanta_forecast = atlanta_model.predict(atlanta_future_dates)
dallas_forecast = dallas_model.predict(dallas_future_dates)
print("\n Tampa Forecast")
print(tampa_forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
print("\n Atlanta Forecast")
print(atlanta_forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
print("\n Dallas Forecast")
print(dallas_forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
Tampa Forecast
ds yhat yhat_lower yhat_upper
322 2022-11-01 323118.309175 253732.149433 387406.753273
323 2022-12-01 324751.225091 252363.477536 391282.362862
324 2023-01-01 326235.388529 252609.829490 397444.362584
325 2023-02-01 327985.986988 250764.365973 401007.159341
326 2023-03-01 329104.325114 249025.996413 404802.364577
Atlanta Forecast
ds yhat yhat_lower yhat_upper
322 2022-11-01 303059.762461 274872.307853 335884.424245
323 2022-12-01 304708.676038 275892.266996 338655.895712
324 2023-01-01 306031.025751 276463.634362 341596.875181
325 2023-02-01 307113.396523 276041.137551 344892.126853
326 2023-03-01 307892.969240 276394.979296 345804.118893
Dallas Forecast
ds yhat yhat_lower yhat_upper
322 2022-11-01 326252.275464 314034.507369 337742.741925
323 2022-12-01 328465.091291 315786.796804 340619.765424
324 2023-01-01 330688.774884 316815.414070 343779.750784
325 2023-02-01 331200.819008 317027.621674 344516.223510
326 2023-03-01 331308.636502 317056.637128 345599.092560
tampa_model.plot(tampa_forecast, uncertainty=True)
atlanta_model.plot(atlanta_forecast, uncertainty=True)
dallas_model.plot(dallas_forecast, uncertainty=True)
tampa_model.plot_components(tampa_forecast)
atlanta_model.plot_components(atlanta_forecast)
dallas_model.plot_components(dallas_forecast)
Given that the recommendation for investment will be three zip codes in Dallas, an economic analysis was conducted to examine the effects of other economic forces on SFR values. The three additional socio-economic measures that were considered were Employed Population, GDP, and Total Population between 2005 and 2020. The employed population of Dallas combines the change in overall population with the participation rate. The GDP measurement is the Gross Domestic Product localized in the Dallas metro area. The total population shows the change over time in the population of Dallas.
These three measurements were compared to an annualized version of the SFR values for Dallas starting. In order to chart the relative growth of these four indicators, each one was divided by the initial value so that they all started with a representative value of 1.0 in 2005. Population growth and employed population growth are correlated but neither seems to have any correlation with SFR values. There seems to be a strong parallel between GDP growth and the growth in SFR values. Both GDP and home values were affected by the downturn in 2008, but GDP recovered in 1 year while it took 4 years for home values to recover. GDP forecasts for Dallas will likely be a strong indicator for the direction of home values in Dallas.
# Import and clean the socio economic data from the Passport report
dalStats = pd.read_excel (r'DALLAS_Passport_Stats_16-08-2021_0525_GMT.xls')
# Manipulate the rows and columns
dalStats = dalStats.iloc[4:32,:]
dalStats.columns = dalStats.iloc[0]
dalStats = dalStats.iloc[pd.RangeIndex(len(dalStats)).drop(0)]
dalStats = dalStats.drop(labels=[5,7,9], axis=0)
dalStats = dalStats.drop(labels=range(10, 31), axis=0)
dalStats = dalStats.drop(["Geography", "Data Type", "Unit", "Current Constant", "1996", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004"], axis=1)
# Rename columns
mapping = {dalStats.columns[0]:'Category'
, dalStats.columns[1]: '2005-1-31'
, dalStats.columns[2]: '2006-1-31'
, dalStats.columns[3]: '2007-1-31'
, dalStats.columns[4]: '2008-1-31'
, dalStats.columns[5]: '2009-1-31'
, dalStats.columns[6]: '2010-1-31'
, dalStats.columns[7]: '2011-1-31'
, dalStats.columns[8]: '2012-1-31'
, dalStats.columns[9]: '2013-1-31'
, dalStats.columns[10]: '2014-1-31'
, dalStats.columns[11]: '2015-1-31'
, dalStats.columns[12]: '2016-1-31'
, dalStats.columns[13]: '2017-1-31'
, dalStats.columns[14]: '2018-1-31'
, dalStats.columns[15]: '2019-1-31'
, dalStats.columns[16]: '2020-1-31'}
dalStats = dalStats.rename(columns=mapping)
# Reset index
dalStats.reset_index(drop=True, inplace=True)
print("dalStats Dataframe : ")
print(dalStats)
print(dalStats.dtypes)
dalStats Dataframe : 4 Category 2005-1-31 2006-1-31 2007-1-31 2008-1-31 2009-1-31 \ 0 Employed Population 2868.1 2936.2 2974.2 2993.1 2939.2 1 GDP 312813.3 337971.5 355337.1 373923.6 350605.2 2 Total Population 5758.0 5917.1 6049.1 6183.3 6323.0 4 2010-1-31 2011-1-31 2012-1-31 2013-1-31 2014-1-31 2015-1-31 \ 0 3033.9 3118.7 3198.4 3263.9 3360.6 3432.5 1 372454.6 395674.9 420595.3 451590.3 478292.0 492695.1 2 6438.2 6567.4 6690.0 6804.2 6934.1 7065.9 4 2016-1-31 2017-1-31 2018-1-31 2019-1-31 2020-1-31 0 3551.1 3658.3 3747.5 3812.1 3644.5 1 504196.0 535850.8 567941.2 595070.1 585876.5 2 7187.3 7297.5 7394.6 7479.4 7555.0 4 Category object 2005-1-31 float64 2006-1-31 float64 2007-1-31 float64 2008-1-31 float64 2009-1-31 float64 2010-1-31 float64 2011-1-31 float64 2012-1-31 float64 2013-1-31 float64 2014-1-31 float64 2015-1-31 float64 2016-1-31 float64 2017-1-31 float64 2018-1-31 float64 2019-1-31 float64 2020-1-31 float64 dtype: object
# Transpose the dataframe
dalStats_transposed = dalStats.T
dalStats_transposed = dalStats_transposed.reset_index()
dalStats_transposed = dalStats_transposed.rename(columns=dalStats_transposed.iloc[0]).drop(dalStats_transposed.index[0])
dalStats_transposed = dalStats_transposed.rename(columns={"Category": "date"})
dalStats_transposed['date'] = pd.DatetimeIndex(dalStats_transposed['date'])
dalStats_transposed['Employed Population'] = pd.to_numeric(dalStats_transposed['Employed Population'])
dalStats_transposed['GDP'] = pd.to_numeric(dalStats_transposed['GDP'])
dalStats_transposed['Total Population'] = pd.to_numeric(dalStats_transposed['Total Population'])
# Reset index
dalStats_transposed.reset_index(drop=True, inplace=True)
print("dalStats_transposed Dataframe : ")
print(dalStats_transposed)
print(dalStats_transposed.dtypes)
dalStats_transposed Dataframe :
date Employed Population GDP Total Population
0 2005-01-31 2868.1 312813.3 5758.0
1 2006-01-31 2936.2 337971.5 5917.1
2 2007-01-31 2974.2 355337.1 6049.1
3 2008-01-31 2993.1 373923.6 6183.3
4 2009-01-31 2939.2 350605.2 6323.0
5 2010-01-31 3033.9 372454.6 6438.2
6 2011-01-31 3118.7 395674.9 6567.4
7 2012-01-31 3198.4 420595.3 6690.0
8 2013-01-31 3263.9 451590.3 6804.2
9 2014-01-31 3360.6 478292.0 6934.1
10 2015-01-31 3432.5 492695.1 7065.9
11 2016-01-31 3551.1 504196.0 7187.3
12 2017-01-31 3658.3 535850.8 7297.5
13 2018-01-31 3747.5 567941.2 7394.6
14 2019-01-31 3812.1 595070.1 7479.4
15 2020-01-31 3644.5 585876.5 7555.0
date datetime64[ns]
Employed Population float64
GDP float64
Total Population float64
dtype: object
Metro_mean_annual = Metro_mean_transposed.drop(labels=range(1, 109), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(110, 121), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(122, 133), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(134, 145), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(146, 157), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(158, 169), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(170, 181), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(182, 193), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(194, 205), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(206, 217), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(218, 229), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(230, 241), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(242, 253), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(254, 265), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(266, 277), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=range(278, 289), axis=0)
Metro_mean_annual = Metro_mean_annual.drop(labels=[290,291], axis=0)
# Reset index
Metro_mean_annual.reset_index(drop=True, inplace=True)
print("Metro_mean_annual Dataframe : ")
print(Metro_mean_annual)
print(Metro_mean_annual.dtypes)
Metro_mean_annual Dataframe :
date tampa atlanta dallas
0 2005-01-31 206626.053846 203031.608466 169894.350649
1 2006-01-31 259287.061538 213053.652632 168073.264822
2 2007-01-31 274329.384615 220089.602094 174007.456693
3 2008-01-31 245187.276923 219202.104712 176803.081395
4 2009-01-31 199496.192308 200055.869110 169121.170543
5 2010-01-31 174853.946154 180419.664948 169987.666667
6 2011-01-31 161860.738462 166533.454082 163917.019231
7 2012-01-31 149336.546154 153991.985149 158909.946768
8 2013-01-31 160161.215385 160629.300493 166008.813688
9 2014-01-31 181998.638462 181273.112195 179580.460377
10 2015-01-31 193425.361538 192971.665049 192071.206767
11 2016-01-31 209067.700000 204739.898551 211192.875940
12 2017-01-31 228236.469231 216126.375000 231158.258427
13 2018-01-31 245758.061538 234310.942308 249473.313433
14 2019-01-31 263970.076923 251666.355769 265053.902985
15 2020-01-31 273271.184615 261631.778846 272508.208955
date datetime64[ns]
tampa float64
atlanta float64
dallas float64
dtype: object
# Create a dataframe of proportional changes in socio-economic data
dalStats_transposedX = dalStats_transposed.merge(Metro_mean_annual, on='date', how='left', indicator=True)
dalStats_transposedX = dalStats_transposedX.drop(["tampa", "atlanta", "_merge"], axis=1)
dalStats_transposedX = dalStats_transposedX.rename(columns={"dallas": "mean_SFR"})
# Re-scale the data for comparison
dalStats_transposedX["Employed Population"] = dalStats_transposedX["Employed Population"] / 2868.1
dalStats_transposedX["GDP"] = dalStats_transposedX["GDP"] / 312813.3
dalStats_transposedX["Total Population"] = dalStats_transposedX["Total Population"] / 5758.0
dalStats_transposedX["mean_SFR"] = dalStats_transposedX["mean_SFR"] / 169894.350649
print("dalStats_transposedX Dataframe : ")
print(dalStats_transposedX)
print(dalStats_transposedX.dtypes)
dalStats_transposedX Dataframe :
date Employed Population GDP Total Population mean_SFR
0 2005-01-31 1.000000 1.000000 1.000000 1.000000
1 2006-01-31 1.023744 1.080426 1.027631 0.989281
2 2007-01-31 1.036993 1.135940 1.050556 1.024210
3 2008-01-31 1.043583 1.195357 1.073862 1.040665
4 2009-01-31 1.024790 1.120813 1.098124 0.995449
5 2010-01-31 1.057808 1.190661 1.118131 1.000549
6 2011-01-31 1.087375 1.264892 1.140570 0.964817
7 2012-01-31 1.115163 1.344557 1.161862 0.935346
8 2013-01-31 1.138001 1.443642 1.181695 0.977130
9 2014-01-31 1.171716 1.529001 1.204255 1.057013
10 2015-01-31 1.196785 1.575045 1.227145 1.130533
11 2016-01-31 1.238137 1.611811 1.248229 1.243084
12 2017-01-31 1.275513 1.713005 1.267367 1.360600
13 2018-01-31 1.306614 1.815592 1.284231 1.468403
14 2019-01-31 1.329138 1.902317 1.298958 1.560110
15 2020-01-31 1.270702 1.872927 1.312088 1.603986
date datetime64[ns]
Employed Population float64
GDP float64
Total Population float64
mean_SFR float64
dtype: object
fig, ax = plt.subplots(figsize=(12, 10))
sns.set_style('darkgrid')
sns.lineplot(x="date", y="Employed Population", label = "Employed Pop Growth", data=dalStats_transposedX)
sns.lineplot(x="date", y="GDP", label = "GDP Growth", data=dalStats_transposedX)
sns.lineplot(x="date", y="Total Population", label = "Total Pop Growth", data=dalStats_transposedX)
sns.lineplot(x="date", y="mean_SFR", label = "SFR Value Growth", data=dalStats_transposedX)
plt.title('Socio-Economic Growth Rates')
ax.set(xlabel= None , ylabel= None)
plt.show()
Dallas emerged as an ideal metro area for SREIT to invest in. A third of the top 30 zip codes when using 5-year ROI as a criterion, where located in the Dallas area. A map showing the top quartile of zip codes showed a cluster of high growth zip codes in Dallas. Additionally, the forecast provided by fbProphet showed strong growth and low volitility.
The recommendation to SREIT is to invest in Single Family Residence properties in the following zip codes; 75226-Dallas, 75206-Dallas, and 75141-Hutchins. These zip codes experienced very high growth over the past 8 years, the highest being 75226 at 265.3%. Further, it is recommended to forecast both SFR values and Local GDP on an annual basis to predict possible downturns.
8 Year ROI (2012-2020)
# Read in the Dallas Data file
dallasData = pd.read_csv('dallasSubset.csv')
# Select relavant attributes
dallasSubsetID = dallasData[['RegionName', 'City']]
dallasSubsetX = dallasData.iloc[:, 204:300]
# Concatonate the columns to one data frame
dallas_8years = pd.concat([dallasSubsetID, dallasSubsetX.reindex(dallasSubsetID.index)], axis=1)
print(dallas_8years.tail(20))
print('Data Shape:', dallas_8years.shape)
RegionName City 4/30/2012 5/31/2012 6/30/2012 7/31/2012 \
248 75422 Campbell 90660.0 90761.0 90903.0 91092.0
249 75423 Celeste 81210.0 81464.0 81682.0 81942.0
250 76476 Tolar 143654.0 144011.0 143833.0 143615.0
251 76651 Italy 87593.0 86780.0 86590.0 86580.0
252 75246 Dallas 96930.0 97052.0 97050.0 97706.0
253 76487 Poolville 159682.0 160014.0 160298.0 159932.0
254 76093 Rio Vista 103518.0 101994.0 101311.0 102334.0
255 76035 Cresson 208722.0 210255.0 210725.0 211193.0
256 76064 Maypearl 115115.0 115618.0 115884.0 116793.0
257 76670 Milford 81542.0 81175.0 80861.0 79944.0
258 75132 Fate 170395.0 169459.0 168444.0 170266.0
259 76077 Rainbow 181330.0 181643.0 182545.0 183336.0
260 76070 Nemo 231314.0 231413.0 232180.0 233107.0
261 76041 Forreston 57992.0 57353.0 56790.0 56239.0
262 75101 Bardwell 66405.0 66070.0 65798.0 65030.0
263 75164 Josephine 135298.0 132978.0 130882.0 129858.0
264 75247 Dallas 63601.0 64413.0 65661.0 66486.0
265 76623 Italy 75081.0 74140.0 73564.0 72227.0
266 87512 Fort Worth NaN NaN NaN NaN
267 75443 Wolfe City 64460.0 64310.0 64202.0 64149.0
8/31/2012 9/30/2012 10/31/2012 11/30/2012 ... 6/30/2019 7/31/2019 \
248 91444.0 92079.0 92699.0 93372.0 ... 175336 175537
249 82140.0 82538.0 83055.0 83513.0 ... 156399 157288
250 143754.0 143751.0 143484.0 143225.0 ... 216535 217095
251 87659.0 87734.0 87680.0 87395.0 ... 145116 146796
252 98250.0 98649.0 99194.0 100177.0 ... 327004 329233
253 161775.0 164560.0 167134.0 168733.0 ... 278116 279519
254 102290.0 102126.0 101190.0 100741.0 ... 194106 195111
255 212564.0 214330.0 215523.0 215524.0 ... 329797 330777
256 118231.0 118077.0 117976.0 117139.0 ... 206468 208849
257 79964.0 79930.0 80012.0 79695.0 ... 138725 140483
258 171958.0 173495.0 174985.0 176321.0 ... 255809 256148
259 184194.0 184793.0 184925.0 184955.0 ... 263277 264029
260 234198.0 235244.0 235723.0 235884.0 ... 341039 342594
261 56574.0 57194.0 57841.0 58746.0 ... 132736 136537
262 65020.0 65449.0 66438.0 66972.0 ... 120378 122477
263 130923.0 131237.0 131513.0 129257.0 ... 227984 229668
264 67114.0 66424.0 66933.0 67985.0 ... 161684 162669
265 71966.0 71275.0 71221.0 70947.0 ... 133798 136861
266 NaN NaN NaN NaN ... 362944 362726
267 64317.0 64212.0 63877.0 63638.0 ... 102927 103267
8/31/2019 9/30/2019 10/31/2019 11/30/2019 12/31/2019 1/31/2020 \
248 175623 175988 176770 178108 179415 180880
249 158078 158372 157626 157506 157463 158783
250 218140 220208 219530 218837 218177 217261
251 148637 150113 151973 152757 153834 153920
252 330413 331588 332614 334071 336084 337555
253 281732 283425 285333 286934 290274 294328
254 194921 195538 195446 196406 196890 197767
255 330149 329735 328537 327282 325589 324878
256 210839 212454 212866 212720 212527 213056
257 142760 144342 145990 146909 148587 149044
258 256796 259416 261980 262918 262018 259337
259 264488 265131 265229 265403 265625 266141
260 343841 345900 347922 350157 350798 351524
261 139390 140314 141092 141845 143522 144888
262 125410 126797 127934 129004 131110 133553
263 230308 230572 230349 229630 229775 230174
264 163347 162760 162950 161813 161057 158345
265 139534 140826 142184 143704 146062 146992
266 361338 361322 361058 362857 365984 368948
267 103639 104107 104810 105310 105680 105832
2/29/2020 3/31/2020
248 182055 184041
249 159858 161384
250 215095 212702
251 153913 153177
252 339325 340427
253 298428 300931
254 198489 200690
255 323838 322124
256 214353 216802
257 149148 148052
258 257035 252296
259 266214 265927
260 351559 352046
261 146201 146602
262 135487 136271
263 231314 232111
264 155436 152259
265 147358 145880
266 372002 372651
267 105882 105685
[20 rows x 98 columns]
Data Shape: (268, 98)
# Remove rows with missing data
dallas_8years = dallas_8years.drop(labels=[157, 217, 238, 244, 266], axis=0)
# Reset index
Metro_mean_annual.reset_index(drop=True, inplace=True)
print(dallas_8years.tail(20))
print('Data Shape:', dallas_8years.shape)
print('Count rows with missing data:', dallas_8years['4/30/2012'].isnull().sum())
RegionName City 4/30/2012 5/31/2012 6/30/2012 7/31/2012 \
247 76462 Lipan 188766.0 189416.0 189239.0 189057.0
248 75422 Campbell 90660.0 90761.0 90903.0 91092.0
249 75423 Celeste 81210.0 81464.0 81682.0 81942.0
250 76476 Tolar 143654.0 144011.0 143833.0 143615.0
251 76651 Italy 87593.0 86780.0 86590.0 86580.0
252 75246 Dallas 96930.0 97052.0 97050.0 97706.0
253 76487 Poolville 159682.0 160014.0 160298.0 159932.0
254 76093 Rio Vista 103518.0 101994.0 101311.0 102334.0
255 76035 Cresson 208722.0 210255.0 210725.0 211193.0
256 76064 Maypearl 115115.0 115618.0 115884.0 116793.0
257 76670 Milford 81542.0 81175.0 80861.0 79944.0
258 75132 Fate 170395.0 169459.0 168444.0 170266.0
259 76077 Rainbow 181330.0 181643.0 182545.0 183336.0
260 76070 Nemo 231314.0 231413.0 232180.0 233107.0
261 76041 Forreston 57992.0 57353.0 56790.0 56239.0
262 75101 Bardwell 66405.0 66070.0 65798.0 65030.0
263 75164 Josephine 135298.0 132978.0 130882.0 129858.0
264 75247 Dallas 63601.0 64413.0 65661.0 66486.0
265 76623 Italy 75081.0 74140.0 73564.0 72227.0
267 75443 Wolfe City 64460.0 64310.0 64202.0 64149.0
8/31/2012 9/30/2012 10/31/2012 11/30/2012 ... 6/30/2019 7/31/2019 \
247 188710.0 189067.0 189262.0 189311.0 ... 282526 285247
248 91444.0 92079.0 92699.0 93372.0 ... 175336 175537
249 82140.0 82538.0 83055.0 83513.0 ... 156399 157288
250 143754.0 143751.0 143484.0 143225.0 ... 216535 217095
251 87659.0 87734.0 87680.0 87395.0 ... 145116 146796
252 98250.0 98649.0 99194.0 100177.0 ... 327004 329233
253 161775.0 164560.0 167134.0 168733.0 ... 278116 279519
254 102290.0 102126.0 101190.0 100741.0 ... 194106 195111
255 212564.0 214330.0 215523.0 215524.0 ... 329797 330777
256 118231.0 118077.0 117976.0 117139.0 ... 206468 208849
257 79964.0 79930.0 80012.0 79695.0 ... 138725 140483
258 171958.0 173495.0 174985.0 176321.0 ... 255809 256148
259 184194.0 184793.0 184925.0 184955.0 ... 263277 264029
260 234198.0 235244.0 235723.0 235884.0 ... 341039 342594
261 56574.0 57194.0 57841.0 58746.0 ... 132736 136537
262 65020.0 65449.0 66438.0 66972.0 ... 120378 122477
263 130923.0 131237.0 131513.0 129257.0 ... 227984 229668
264 67114.0 66424.0 66933.0 67985.0 ... 161684 162669
265 71966.0 71275.0 71221.0 70947.0 ... 133798 136861
267 64317.0 64212.0 63877.0 63638.0 ... 102927 103267
8/31/2019 9/30/2019 10/31/2019 11/30/2019 12/31/2019 1/31/2020 \
247 287001 289639 288708 288140 287005 286685
248 175623 175988 176770 178108 179415 180880
249 158078 158372 157626 157506 157463 158783
250 218140 220208 219530 218837 218177 217261
251 148637 150113 151973 152757 153834 153920
252 330413 331588 332614 334071 336084 337555
253 281732 283425 285333 286934 290274 294328
254 194921 195538 195446 196406 196890 197767
255 330149 329735 328537 327282 325589 324878
256 210839 212454 212866 212720 212527 213056
257 142760 144342 145990 146909 148587 149044
258 256796 259416 261980 262918 262018 259337
259 264488 265131 265229 265403 265625 266141
260 343841 345900 347922 350157 350798 351524
261 139390 140314 141092 141845 143522 144888
262 125410 126797 127934 129004 131110 133553
263 230308 230572 230349 229630 229775 230174
264 163347 162760 162950 161813 161057 158345
265 139534 140826 142184 143704 146062 146992
267 103639 104107 104810 105310 105680 105832
2/29/2020 3/31/2020
247 286085 286026
248 182055 184041
249 159858 161384
250 215095 212702
251 153913 153177
252 339325 340427
253 298428 300931
254 198489 200690
255 323838 322124
256 214353 216802
257 149148 148052
258 257035 252296
259 266214 265927
260 351559 352046
261 146201 146602
262 135487 136271
263 231314 232111
264 155436 152259
265 147358 145880
267 105882 105685
[20 rows x 98 columns]
Data Shape: (263, 98)
Count rows with missing data: 0
# Add a column for 8-year ROI
dallas_8years['growth8yr'] = ((dallas_8years['3/31/2020'] - dallas_8years['4/30/2012']) / dallas_8years['4/30/2013'])
print(dallas_8years.tail(20))
print(dallas_8years.dtypes)
RegionName City 4/30/2012 5/31/2012 6/30/2012 7/31/2012 \
247 76462 Lipan 188766.0 189416.0 189239.0 189057.0
248 75422 Campbell 90660.0 90761.0 90903.0 91092.0
249 75423 Celeste 81210.0 81464.0 81682.0 81942.0
250 76476 Tolar 143654.0 144011.0 143833.0 143615.0
251 76651 Italy 87593.0 86780.0 86590.0 86580.0
252 75246 Dallas 96930.0 97052.0 97050.0 97706.0
253 76487 Poolville 159682.0 160014.0 160298.0 159932.0
254 76093 Rio Vista 103518.0 101994.0 101311.0 102334.0
255 76035 Cresson 208722.0 210255.0 210725.0 211193.0
256 76064 Maypearl 115115.0 115618.0 115884.0 116793.0
257 76670 Milford 81542.0 81175.0 80861.0 79944.0
258 75132 Fate 170395.0 169459.0 168444.0 170266.0
259 76077 Rainbow 181330.0 181643.0 182545.0 183336.0
260 76070 Nemo 231314.0 231413.0 232180.0 233107.0
261 76041 Forreston 57992.0 57353.0 56790.0 56239.0
262 75101 Bardwell 66405.0 66070.0 65798.0 65030.0
263 75164 Josephine 135298.0 132978.0 130882.0 129858.0
264 75247 Dallas 63601.0 64413.0 65661.0 66486.0
265 76623 Italy 75081.0 74140.0 73564.0 72227.0
267 75443 Wolfe City 64460.0 64310.0 64202.0 64149.0
8/31/2012 9/30/2012 10/31/2012 11/30/2012 ... 7/31/2019 8/31/2019 \
247 188710.0 189067.0 189262.0 189311.0 ... 285247 287001
248 91444.0 92079.0 92699.0 93372.0 ... 175537 175623
249 82140.0 82538.0 83055.0 83513.0 ... 157288 158078
250 143754.0 143751.0 143484.0 143225.0 ... 217095 218140
251 87659.0 87734.0 87680.0 87395.0 ... 146796 148637
252 98250.0 98649.0 99194.0 100177.0 ... 329233 330413
253 161775.0 164560.0 167134.0 168733.0 ... 279519 281732
254 102290.0 102126.0 101190.0 100741.0 ... 195111 194921
255 212564.0 214330.0 215523.0 215524.0 ... 330777 330149
256 118231.0 118077.0 117976.0 117139.0 ... 208849 210839
257 79964.0 79930.0 80012.0 79695.0 ... 140483 142760
258 171958.0 173495.0 174985.0 176321.0 ... 256148 256796
259 184194.0 184793.0 184925.0 184955.0 ... 264029 264488
260 234198.0 235244.0 235723.0 235884.0 ... 342594 343841
261 56574.0 57194.0 57841.0 58746.0 ... 136537 139390
262 65020.0 65449.0 66438.0 66972.0 ... 122477 125410
263 130923.0 131237.0 131513.0 129257.0 ... 229668 230308
264 67114.0 66424.0 66933.0 67985.0 ... 162669 163347
265 71966.0 71275.0 71221.0 70947.0 ... 136861 139534
267 64317.0 64212.0 63877.0 63638.0 ... 103267 103639
9/30/2019 10/31/2019 11/30/2019 12/31/2019 1/31/2020 2/29/2020 \
247 289639 288708 288140 287005 286685 286085
248 175988 176770 178108 179415 180880 182055
249 158372 157626 157506 157463 158783 159858
250 220208 219530 218837 218177 217261 215095
251 150113 151973 152757 153834 153920 153913
252 331588 332614 334071 336084 337555 339325
253 283425 285333 286934 290274 294328 298428
254 195538 195446 196406 196890 197767 198489
255 329735 328537 327282 325589 324878 323838
256 212454 212866 212720 212527 213056 214353
257 144342 145990 146909 148587 149044 149148
258 259416 261980 262918 262018 259337 257035
259 265131 265229 265403 265625 266141 266214
260 345900 347922 350157 350798 351524 351559
261 140314 141092 141845 143522 144888 146201
262 126797 127934 129004 131110 133553 135487
263 230572 230349 229630 229775 230174 231314
264 162760 162950 161813 161057 158345 155436
265 140826 142184 143704 146062 146992 147358
267 104107 104810 105310 105680 105832 105882
3/31/2020 growth8yr
247 286026 0.509167
248 184041 0.971818
249 161384 0.932538
250 212702 0.469890
251 153177 0.745213
252 340427 2.284684
253 300931 0.817271
254 200690 0.958531
255 322124 0.516546
256 216802 0.839584
257 148052 0.843843
258 252296 0.460035
259 265927 0.451213
260 352046 0.503407
261 146602 1.328705
262 136271 1.030746
263 232111 0.754612
264 152259 1.263060
265 145880 0.964603
267 105685 0.636120
[20 rows x 99 columns]
RegionName int64
City object
4/30/2012 float64
5/31/2012 float64
6/30/2012 float64
...
12/31/2019 int64
1/31/2020 int64
2/29/2020 int64
3/31/2020 int64
growth8yr float64
Length: 99, dtype: object
dallas_8years.sort_values(by=['growth8yr'], inplace=True, ascending=False)
print(dallas_8years.head(3))
RegionName City 4/30/2012 5/31/2012 6/30/2012 7/31/2012 \
222 75226 Dallas 68380.0 67838.0 67818.0 69115.0
102 75208 Dallas 81627.0 82459.0 83348.0 84955.0
237 75141 Hutchins 46001.0 45678.0 45116.0 45416.0
8/31/2012 9/30/2012 10/31/2012 11/30/2012 ... 7/31/2019 8/31/2019 \
222 70221.0 71066.0 71711.0 72548.0 ... 256599 257564
102 85737.0 86252.0 86627.0 87580.0 ... 301616 303055
237 45765.0 46855.0 47595.0 48173.0 ... 156644 157846
9/30/2019 10/31/2019 11/30/2019 12/31/2019 1/31/2020 2/29/2020 \
222 261018 264914 268522 269731 270038 270456
102 307351 312162 316847 318606 321020 323218
237 159603 160230 160677 160674 161414 162083
3/31/2020 growth8yr
222 271087 2.653129
102 325442 2.634699
237 162905 2.404838
[3 rows x 99 columns]